computer architectures for dna self-assembled nanoelectronics alvin r. lebeck department of computer...
TRANSCRIPT
Computer Architectures for DNA Self-Assembled Nanoelectronics
Alvin R. LebeckDepartment of Computer Science
Duke University
+ =
Duke Computer Architecture
2© 2006 A. R. LebeckDuke Computer Architecture
Acknowledgements
People
• Students: Jaidev Patwardhan, Constantin Pistol, Vijeta Johri, Sung-Ha Park, Nathan Sadler, Niranjan Soundararajan, Ben Burnham, R. Curt Harting
• Chris Dwyer, Daniel J. Sorin, Thomas H. LaBean, Jie Liu, John H. Reif, Hao Yan
• Sean Washburn, Dorothy A. Erie (UNC)
Funding
• Air Force Research Lab
• National Science Foundation (ITR)
• Duke University Office of the Provost
• Equipment from IBM & Intel
3© 2006 A. R. LebeckDuke Computer Architecture
Current Processor Designs• Large Complex Systems
(millions/billions of transistors)
• Mature technology (CMOS)
• Precise control of entire design and fabrication process
• Lithographic process to create smaller and smaller features.
– But has limits…
• Cost of facility, high defect rates, process variation, etc.
SiliconN dopedN doped
S DGate
Transistor
4© 2006 A. R. LebeckDuke Computer Architecture
The Red Brick Wall• “Eventually, toward the
end of the Roadmap or beyond, scaling of MOSFETs (transistors) will become ineffective and/or very costly, and advanced non-CMOS solutions will need to be implemented.” [International Technology Roadmap for Semiconductors, 2003 Edition, Difficult Challenge #10]
0
5
10
15
20
25
30
35
NA
ND
Del
ay (
ps)
CMOS-Known
CMOS-Red Brick Wall
5© 2006 A. R. LebeckDuke Computer Architecture
The Potential Solution
• Self-Assembled Nanoelectronics• Self-assembly
– Molecules self-organize into stable structures (nano)
• What nanostructures?• What nanoelectronic devices?• How does self-assembly affect
computer system design?
6© 2006 A. R. LebeckDuke Computer Architecture
Outline
• Nanostructures & Components
• Circuit Design Issues
• Architectural Implications
• Proposed Architectures
• Defect Tolerance
• Conclusion
7© 2006 A. R. LebeckDuke Computer Architecture
DNA Self-Assembly
• Well defined rules for base pair matching
– Thermodynamics driven hybridization
• Can specify sequence of pairs, forms double helix
– Synthetic DNA
– Engineered Nanostructures
– Inexpensive lab equipment
Adenine (A) (T) Thymine
Cytosine (C) (G) Guanine
Sticky End(Tag)
20 nm
[Seeman ’99, Winfree et al. ’98,Yan, et al. ’03]
Strands→Tiles→Structures
8© 2006 A. R. LebeckDuke Computer Architecture
DNA-based Self-Assembly of Nanoscale Systems
• Use synthetic DNA as scaffolding for nanoelectronics
• Create circuits (nodes) using aperiodic patterning– Demonstrated aperiodic patterns with 20nm pitch
[FNANO ’05, Angewandte Chemie ’06, DAC ’06]
9© 2006 A. R. LebeckDuke Computer Architecture
Nanoelectronic Components
• Many Choices / Challenges – Good Transistor Behavior
– Interaction with DNA Lattice
• Crossed Nanotube Transistor [Fuhrer et al. ’01]
• Demonstrated Functionalization of Tube Ends [Dwyer, et al. ’02]
• Other candidates: Ring-gated, Crossed Nanorod, Crossed Carbon Nanotube FETs
A CG T
[Dwyer, et al. IEEE FNANO ’04]
05
101520253035
NA
ND
Del
ay (
ps) CMOS-Known
CMOS-Red Brick WallCNFET
10© 2006 A. R. LebeckDuke Computer Architecture
Circuit Design Issues
Goal
Construct a computing system using the DNA Lattice and nanoelectronic components.
Proposal
Use DNA tags (sticky-ends) to place nano-components on lattice
1. Regularity of DNA Lattice Easy to replicate simple structures on a moderate scale
2. Complexity of Digital Circuits Large Graph with many unique nodes and edges
3. Tolerating Defects Single-stranded DNA for tags (sticky-ends) may have partial matches (must
minimize number of unique tags) Nanotubes may not work as advertised
11© 2006 A. R. LebeckDuke Computer Architecture
Balancing Regularity & Complexity
• Array of simple objects
• Unit Cell based on lattice cavity
– Uniform length nanotubes
– Minimizes # of DNA Tags => reduces probability of partial match
– 20nm x 20nm
• Two levels of interconnect
• Complex circuits on single lattice (10K FETS)
• Envision ~9µm2 node size: ~10,000 FETs + interconnect
• How to get billions or more?
A
B
20nmVdd plane
Ground planeInsulating Layer
Interconnect Layers
12© 2006 A. R. LebeckDuke Computer Architecture
Self-Assembled System
• Self-assemble ~ 109 - 1012 simple nodes (~10K FETs)
• Potential: Tera to Peta-scale computing
• Random Graph of Small Scale Nodes
– There will be defects
– Scaled CMOS may look similar
• How do we perform useful computation?
+A
B
20nm
Node Interconnect Node
Node
Wire [Yan ’03]
(selective metallization)
13© 2006 A. R. LebeckDuke Computer Architecture
Outline
• Nanostructures & Components
• Circuit Design Issues
• Architectural Implications
• Proposed Architectures
• Defect Tolerance
• Conclusion
14© 2006 A. R. LebeckDuke Computer Architecture
Implications of Small Nodes
• Node: DNA Grid FETs– 3m x 3m node– Carbon nanotube [Dwyer ’02]
– Ring Gated [Skinner ’05]
• Small Scale Control– Controlled complexity only within one node
• Limited space on each node– Simple circuits (e.g., full adder)
• Limited communication between nodes– only 4 neighbors
– No global (long haul) interconnect
• Limited coordination– Difficult to get many nodes to work together (e.g., 64-bit adder)
A
B
20nm
15© 2006 A. R. LebeckDuke Computer Architecture
Implications of Randomness
• Self-assemble interconnect of nodes
1. Random node placement
2. Random node orientation
3. Random connectivity
4. High defect rates (assume fail stop node)
• Limitations -> architectural challenges
16© 2006 A. R. LebeckDuke Computer Architecture
Architectural Challenges
• Node Design
• Utilizing Multiple Nodes– Each node is very simple
• Routing
• Execution Model– Must overcome implementation constraints
• Instruction Set
• Micro-scale Interface
17© 2006 A. R. LebeckDuke Computer Architecture
Outline
• Nanostructures & Components
• Circuit Design Issues
• Architectural Implications
• Proposed Architectures– Defect Isolation & Structure– NANA [JETC ’06]
– SOSA [ASPLOS ’06]
• Defect Tolerance
• Conclusion
18© 2006 A. R. LebeckDuke Computer Architecture
Nano-scale Active Network Architecture
• Large-scale fabrication (1012 nodes, 109 cells)
• Via provides micro-scale interface, Multiple Node Types
• First Cut: Understand issues
A Single CellSystem View
19© 2006 A. R. LebeckDuke Computer Architecture
Defect Isolation/Structure• Grid w/ Defects → Random
Graph• Reverse path forwarding [Dalal ’78]
• Broadcast on all links except input [Nanoarch ’05]
– Forward broadcast if not seen before– Implement fail-stop nodes [Nanoarch ’06]
• RPF maps out defective regions– No external defect map– Can tolerate up to 30% defective nodes
• Distributed algorithm to create spanning tree
• Route packets along tree– Up*/down*– Depth first
• How do we compute?
Anchor
Defective NodeNode
Node (after RPF)
Root Direction
20© 2006 A. R. LebeckDuke Computer Architecture
NANA: Computing on a Random Graph
• Perform 3 operations: Add, Add, Multiply
• Search along path for correct blocks to perform function
• Execution packets carry operation and values
• Proof-of-concept simulations
X
+X
-
X+
+
Enter Exit
21© 2006 A. R. LebeckDuke Computer Architecture
NANA: Execution Model & ISA
• Accumulator based ISA
• Carry data and instructions in a “packet”
• Use bit-serial processing elements– Each element operates on one bit at a time
– Minimize inter-bit communication
Header Tailop 1 op 2 op 3 A0 B0 C0 D0 A1 B1 C1 D1 B31C31 D31A31
opcode Bit Interleaved operands
22© 2006 A. R. LebeckDuke Computer Architecture
NANA: System Overview
• Simple programs– Fibonacci
– String compare
• Utilization is low
• Divide 1012 nodes into 109 cells
• Peak performance potentially higher than IBM Blue Gene and NEC Earth Simulator
• Need to use more nodes! 0
5
10
15
20
25
DNA-SA BlueGene
EarthSimLo
g P
eak
Per
form
ance
(bi
tops
/sec
)
23© 2006 A. R. LebeckDuke Computer Architecture
Self-Assembled System
• Self-assemble ~ 109 - 1012 simple nodes (~10K FETs)
• Potential: Tera to Peta-scale computing
• Random Graph of Small Scale Nodes
– There will be defects
– Scaled CMOS may look similar
• How do we perform useful computation?
+A
B
20nm
Node Interconnect Node
Node
Wire [Yan ’03]
(selective metallization)
PE PE
Control Processor
• Group many nodes into a SIMD PE
• PEs connected in logic ring
• Familiar data parallel programming
24© 2006 A. R. LebeckDuke Computer Architecture
Self-Organizing SIMD Architecture (SOSA)
• Nodes Grouped to form SIMD Processing Element (PE)– Head, Tail, N computation nodes (k-wide bit-slice of PE)
• Configuration: Depth First Traversal of Spanning Tree– Orders nodes within PE (Head → LSB → …→ MSB → Tail)– Orders PEs
• Many SIMD PEs on logical ring → familiar data parallel programming abstraction
VIATree Edge
PE boundary
13
5
6
8
9
12
24
710
11
25© 2006 A. R. LebeckDuke Computer Architecture
SOSA: Instruction Broadcast
• Instructions broadcast to all nodes
• Instructions decomposed into three “microinstructions” (opcode, registers, synch)
• Can reach nodes/PEs at different times (5 before 9)
13
5
6
8
9
12
24
710
11
Enter
PE boundary
26© 2006 A. R. LebeckDuke Computer Architecture
SOSA: Instruction Execution
• Instructions execute asynchronously within/across PEs
• XOR parallel within PE vs. Addition serial within PE
• ISA: Three register operand, predication, optimizations, see paper for details…
13
5
6
8
9
12
24
710
11
Enter
PE boundary
27© 2006 A. R. LebeckDuke Computer Architecture
Two System Configurations
• One Large System
• Latency
• Space sharing
• Multiple “cells”
• Throughput
28© 2006 A. R. LebeckDuke Computer Architecture
Outline
• Nanostructures & Components
• Circuit Design Issues
• Architectural Implications
• Proposed Architectures– SOSA [ASPLOS ’06]
– Node Design
– Evaluation
• Defect Tolerance
• Conclusion
29© 2006 A. R. LebeckDuke Computer Architecture
SOSA Node
• Homogeneous Nodes– Specialized during configuration
• Asynchronous Logic
• Communication– 4 tranceivers (4 phase handshake)
– 3 virtual channels (inst bcast, ring left & right)
• Computation– ALU
– Register (32-bits: 32x1 or 16x2)
– Inst Buffer
• Configuration– Route Setup
• Subcomponent BIST[nanoarch ’06]
ALU
SR
VC1
Buf
fer
Out
put
VC2
Buf
fer
Out
put
EntryRouting
Transceiver 1
Register File
Control
Tra
nsce
iver
0
Tra
nsce
iver
2
Transceiver 3
Cha
nnel
1V
irtu
al
Vir
tual
Cha
nnel
2
Vir
tual
Cha
nnel
0
Analog Control
Synch Control Reg
DataBuffer
VC2
VC1
DeM
ux
Instruction Buffer
Reg
iste
rSp
ecif
iers
Buf
fer
Inpu
tB
uffe
rIn
put
C
VC0
Opcode
Mu
x
Point to PointNetwork
LogicSend
PointLogic
Point to
RouteLogic
CMDLogic
WritingLogic Logic
Receive
D S D S
DeM
uxVC0
Buf
fer
Inpu
t
Mux
Buf
fer
Out
putRoute Setup Logic
30© 2006 A. R. LebeckDuke Computer Architecture
SOSA Node• VHDL
– ~10K FETs
• Area ~= 9m2
– Custom layout tools for standard cells
• Power ~= 6.5 W/cm2
– Semi-empirical spice model [IEEE Nano ’04]
– 1ns switching time
– 88% devices active
– 0.775W / node
• Modern proc > 75 W/cm2
Transceiver 1
Tra
nsce
iver
0
Tra
nsce
iver
2
Transceiver 3
Configuration Logic
Compute Logic
31© 2006 A. R. LebeckDuke Computer Architecture
Evaluation Methodology
• Custom event simulator– Conservative 1ns time quantum (switching time)
– 2 bits per node (16 registers, 16 + 2 for 32-bit PE)
• Nine benchmarks– Integer code only – no hardware support for floating point
– Matrix multiplication, image filters (gaussian, generic, median), encryption (TEA, XTEA), sort, search, bin-packing
• Compare performance to four other architectures– Pentium 4 (P4) (real hardware)
– ideal out-of-order superscalar (I-SS) 10GHz, 128-wide, 8K ROB
– ideal Chip Multiprocessor (I-CMP) 16-way ideal
– ideal SOSA (I-SOSA) no communication overhead, unit inst latency
– extrapolate for large SOSA systems (back validate)
32© 2006 A. R. LebeckDuke Computer Architecture
Matrix Multiply (Execution Time)
• Hand optimizations (loop unrolling, etc.)
• Better scalability than other systems (crossover < 1000)
• Still room for improvement
1.E-01
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
1.E+09
1.E+10
1.E+11
1 10 100 1000 10000 100000
Matrix Dimension
Ru
n T
ime
(m
icro
se
co
nd
s)
Pentium 4
Ideal Single Core
Ideal 16-CMP
Ideal SOSA
SOSA
"Extrapolated SOSA"
33© 2006 A. R. LebeckDuke Computer Architecture
TEA Encryption (Throughput)
Architecture Encryptions/sec
P4 @ 3 GHz (100mm2) 3.9 M/sec
I-SS 73.62 M/sec
16-CMP 1180 M/sec
SOSA (1 cell ~ 0.019mm2) 0.175 M/sec
I-SOSA (1 cell) 27.7 M/sec
SOSA (5400 cells, 100mm2) 940 M/sec
I-SOSA(5400 cells) 72300 M/sec
• Used in XBOX• shift, add and xor• 64 bit data blocks • 128-bit key• Pipelined on 64 PEs
• Configure Multiple Cells of 64 PEs
• Single Cell poor
• 200X better than P4 in same area
34© 2006 A. R. LebeckDuke Computer Architecture
Outline
• Nanostructures & Components
• Circuit Design Issues
• Architectural Implications
• Proposed Architectures
• Defect Tolerance (not transient faults)
• Conclusion
35© 2006 A. R. LebeckDuke Computer Architecture
Defect Tolerance
• Simple Fail Stop model
• Encryption gracefully degrades
• MXM < 10% degradation up to 20% defective nodes
400
500
600
700
800
900
1000
0 10 20 30 40
Node Defect Rate (%)
Th
rou
gh
pu
t (M
illio
n E
ncr
yptio
ns
Per
Sec
on
d)
Tea
0.94
0.96
0.98
1
1.02
1.04
1.06
1.08
1.1
1.12
0 1 5 10 15 20
Node Defect Rate (%)
No
rma
lize
d R
un
tim
e
Matrix MultiplyEncryption
36© 2006 A. R. LebeckDuke Computer Architecture
Node Failure Modes [Nanoarch ’06]
Transceiver
Transceiver
Tra
nsc
eiv
er
Tra
nsc
eiv
er
Compute Logic
Configuration
Simple
Transceiver
Transceiver
Tra
nsc
eiv
er
Tra
nsc
eiv
er
Compute Logic
Configuration
Compute - CentricCommunication - Centric
Transceiver
Transceiver
Tra
nsc
eiv
er
Tra
nsc
eiv
er
Compute Logic
Configuration
Hybrid – Any Two Components
• Exploit modular node design– VHDL BIST for communication & configuration (all stuck-at faults)
– Assume software test for compute logic
• Configuration logic is critical
37© 2006 A. R. LebeckDuke Computer Architecture
Evaluation
• Simple node model in C
• Model network with 10,000 nodes
• Vary transistor defect probability from 0%-0.1%– Map defective transistors to defective components
• Average 500 runs per data point
• How much do we benefit by node modularity?– What device defect probability can it handle?
38© 2006 A. R. LebeckDuke Computer Architecture
Results: Usable Nodes
0
10
20
30
40
50
60
70
80
90
100
1.E-06 1.E-05 1.E-04 1.E-03
Device Failure Probability
% D
efe
cti
ve
No
de
s
Simple
Communication Centric
Compute Centric
Hybrid - Any Two
• Hybrid failure mode can tolerate a higher device failure probability– Three orders of magnitude greater than typical CMOS designs (10-4 vs. 10-7)
39© 2006 A. R. LebeckDuke Computer Architecture
Results: Reachable Nodes
0
10
20
30
40
50
60
70
80
90
100
0.0E+00 5.0E-05 1.0E-04 1.5E-04 2.0E-04 2.5E-04 3.0E-04 3.5E-04 4.0E-04
Device Failure Probability
% N
od
es R
each
able
Simple
Compute Centric
Hybrid-Total
Hybrid - Compute Only
• Hybrid increases the number of reachable nodes– More nodes with functioning compute logic reachable and usable
40© 2006 A. R. LebeckDuke Computer Architecture
Fail-Stop Summary
• Test logic detects defects in node components
• Modular node design enables partial node operation
• Node is useful if– It can compute
OR
– It can improve system connectivity
• Hybrid failure mode increases available nodes– Can help tolerate a device failure probability of 1.5x10
-4 (1000
times greater than typical CMOS designs)
41© 2006 A. R. LebeckDuke Computer Architecture
SOSA Summary
• Distributed algorithm for structure & defect tolerance– No external defect map
• Configuration groups nodes into SIMD PEs
• High utilization w/ familiar programming model
• Ability to reconfigure– One system for latency critical systems
– Multiple cells for throughput systems
• Limitations: I/O bandwidth, general purpose codes, FP, transient faults
42© 2006 A. R. LebeckDuke Computer Architecture
Conclusion
• Future limits on traditional CMOS scaling– Multicore, etc. -> tera/peta scale w/ 1M nodes
• Defects, cost of fabrication, process variation, etc.• High performance, low power despite randomness and
defects
+ =
Engineered DNANanostructures Nanoelectronics
ComputersOf Tomorrow
43© 2006 A. R. LebeckDuke Computer Architecture
Duke Nanosystems Overview
DNA Self-Assembly [FNANO 2005, Ang.
Chemie 2006, DAC 2006]
Nano DevicesElectronic, optical, etc.
[Nanoletters 2006]
Large Scale Interconnection
[NANONETS 2006]
Circuit Architecture [FNANO 2004]
Logical Structure & Defect Isolation [NANOARCH 2005]
A
3.6
1.01.1
1.2
1.31.4 1.5
1.7
1.6 1.T
2.H
2.0 2.1
2.2
2.32.4
2.5
2.T
2.72.6
3.H
3.0
3.4
3.5
3.73.1
3.2
3.3
3.T
1.H
VIA
SOSA - Data Parallel Architecture [NANOARCH 2006,
ASPLOS 2006]
NANA - General Purpose Architecture [JETC 2006]
MA
EA
44© 2006 A. R. LebeckDuke Computer Architecture
Generic Filter (Execution Time)
• 3x3 generic filter (Gaussian & Median similar)
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
1.E+09
1 10 100 1000 10000 100000
Image Width
Run
Tim
e (n
anos
econ
ds)
P4I-SS16-CMPI-SOSASOSAExtrapolated SOSA
45© 2006 A. R. LebeckDuke Computer Architecture
Circuit Architecture
• Unit Cell based on lattice cavity
– Place uniform length nanoelectronic devices
– Reduces probability of partial matches
– Two layers of interconnect
• Achieve balance between– Regularity of DNA lattice
– Complexity required for circuits
– Defect Tolerance
• Node: DNA Lattice with CNFETs
20nm
Carbon nanotubes
Vdd plane
Ground planeInsulating Layer
Interconnect Layers
Metal nanoparticles
46© 2006 A. R. LebeckDuke Computer Architecture
001
Fail-Stop Transceivers
• Minimize test overhead– Reuse node hardware during test
• Hardware Test– Send ‘0’ and ‘1’ in a loop
– If data returns, enable component
– If data does not return, component remains disabled
• Similar principle for configuration logic
• Modular design enables graceful degradation
Transmit Logic
Receive Logic
Output Buffer
Input Buffer
Test Logic
Test loopback path
TEST_OK=0