dynamically programmable array architecture
DESCRIPTION
Dynamically Programmable Array Architecture. Robert Heaton Obsidian Technology. Mesh of Trees. PU. PU. PU. PU. Busses are BI-directional 2 Cycles to exchange data Separate X and Y dimensions Diagonal routing not directly supported - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/1.jpg)
Confidential
Dynamically Programmable Array Architecture
Dynamically Programmable Array Architecture
Robert Heaton
Obsidian Technology
![Page 2: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/2.jpg)
Confidential
Mesh of TreesMesh of Trees Busses are BI-directional 2 Cycles to exchange data Separate X and Y dimensions Diagonal routing not directly
supported PU’s difficult to program to
take advantage of structure
PU PU
PU PU
PU PU
PU
PU PU
PU PU PU
PU PU PU PU
![Page 3: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/3.jpg)
Confidential
Two Dimensional MeshTwo Dimensional Mesh
PU
PU PU
PUPU
PU PU
PU
PU
PU PU
PU PU
PU PU
PU
![Page 4: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/4.jpg)
Confidential
4x4 Hierarchical Cluster4x4 Hierarchical Cluster
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU
![Page 5: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/5.jpg)
Confidential
Simple 4x4 Cluster WiringSimple 4x4 Cluster Wiring
Bus width = 140u for 16 bit busses
That is a lot of wires!
Budget 4x4 Cluster area is 1mm2
PU PU PU PU
N
Hin1
Hadr12L-2
Hout1
Switch
1.4
6*N W
ires
Joint
M2 Pitch
![Page 6: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/6.jpg)
Confidential
Routing HierarchyRouting Hierarchy 256 PUs 4 Levels of hierarchy
Hadr: up level till L0adr: local address L1adr: level 1 address L2adr: level 2 address L3adr: level 3 address
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
RU2
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
RU2
RU3
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
RU2
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
PU
PU
PU
PU
RU
RU1
RU2
Hadr L0adr L1adr L2adr L3adr
![Page 7: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/7.jpg)
Confidential
Weeks Investigation (9/12/97)Weeks Investigation (9/12/97)
Investigate routing structures Dynamic routing assignment/programming Compromise between area and flexibility Support for tree of trees
Not a complete story yet!
![Page 8: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/8.jpg)
Confidential
Routing UnitRouting Unit
Full Duplex connect busses Each PU node controls its source port via a 2 bit local or 6 bit hierarchical address
Broadcast support Any node may listen to any
other input to the cluster Hierarchical node addressing
must not clash
ProcessUnit(PU)
ProcessUnit(PU)
ProcessUnit(PU)
ProcessUnit(PU)
RoutingUnit(RU)
![Page 9: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/9.jpg)
Confidential
Routing Unit PU Port DetailRouting Unit PU Port Detail
Port numbering is clockwise & relative to each PU port
HBUS port is always at port 3
from port 0from port 1from port 2from port H
PU Input
PU Output
PU Input address
6
N
N
2
4
to other ports
&
s0
s1
![Page 10: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/10.jpg)
Confidential
PU OverviewPU Overview
Simple data path functionality Primitive control options Wide instructions control data path function
and operand routing Conditions may be inverted for “repeat until”
or “Branch If” control Very primitive address arithmetic 32 or less instructions in program
![Page 11: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/11.jpg)
Confidential
N Bit Functional UnitN Bit Functional Unit
Logic functions: OR, XOR, AND, 0, 1 Arithmetic: Add, subtract, Multiply Shifts: single bit left and right Conditional detection: 0, -1, <0, >0.
More optimization needed Routing issues need more work
ALU/MULT
DFF
Bit Shift
CarryLogic
Constbit
ALUCTL
mux0 mux1
mux2
A
F
CinCout
LSin RSin
SFTCTL
Constbit
![Page 12: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/12.jpg)
Confidential
N Bit Functional Unit (V2)N Bit Functional Unit (V2)
Logic functions: OR, XOR, AND, 0, 1 Arithmetic: Add, subtract Shifts: right and left shifts Conditional detection: 0, <0, >0, OF
Memory mapped RAM access to operands
ALU
DFF
B Shift
CarryLogic
ALUCTL
mux0 mux1
mux2
Out
CinCout
LSin RSin
SFTCTL
N b it RAM
Operands
N b it RAM
MultiplySequencer
![Page 13: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/13.jpg)
Confidential
Instruction FieldsInstruction Fields
?? + XN Bits per context
Field Comment BitsALU_CTL Control of Basic ALU Functions 5
SHIFT_CTL Control of the operand shift 2MUX_CTL Control operand muxes 3
BRANCH_ADR Next address if condition true 2COND_MSK Condition mask 5COND_FLD Condition field 5
EXT_COND_SRC Select source for external condition inputs 2HEIR_ADDR Hierarchical routing level address 2
L0_ADDR Level 0 source address 2L1_ADDR Level 1 source address 2L2_ADDR Level 2 source address 2L3_ADDR Level 3 source address 2
![Page 14: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/14.jpg)
Confidential
PU Instruction TypesPU Instruction TypesData Process 00 ALU_CTL, SFT_CTL, MUX_CTL, ROUTE_CTL
Move 01
Immediate OperandMultiply 100
Operand_ValueOP_SEL
Invert +ve OF-ve zero X1 X0 Condition Mask Ext’ Source Sel
15 Bits
R/W
OptionsOP_SEL
Condition Field:
Hadr L0adr L1adr L2adr L3adr
ROUTE_CTL Field:
Attention 101 Options FlagCondition Branch_Adr
Branch 110 Options LinkCondition Branch_Adr
32 Bits
![Page 15: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/15.jpg)
Confidential
Condition FieldCondition Field
X[1:0] are external condition bits & may be source from: Operand bits Global synchronization bus Nearest nabough conditions outputs
Condition Mask is anded with flag bits
Invert +ve OF-ve zero X1 X0 Condition Mask Ext’ Source Sel
15 Bits
Condition Field:
![Page 16: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/16.jpg)
Confidential
Static ProgramStatic Program
PU Never changes function Branch is set to always true Just two Instructions
Data Process
Branch
AlwaysAdr +1
![Page 17: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/17.jpg)
Confidential
More Typical ProgramMore Typical Program
![Page 18: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/18.jpg)
Confidential
Open IssuesOpen Issues
PU Data path width Complexity of shift operations RU Trunking Number of contexts per PU Flexible context RAM partitioning Improve PU synchronization
![Page 19: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/19.jpg)
Confidential
Shifter InstructionsShifter Instructions
![Page 20: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/20.jpg)
Confidential
Design ToolsDesign Tools
PU Assembler Architecture mapping Global resource allocation
![Page 21: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/21.jpg)
Confidential
Conditional N Bit PU CellConditional N Bit PU Cell
ALU/MULT
DFF
Bit Shift
CarryLogic
Constbit
ALUCTL
mux0 mux1
mux2
A B
F
CinCout
LSin RSin
SFTCTLRA
M
ColS
el
ConditionLogic
EXT[1:0]
AddressLogic
Branch
Cout
Cin
RSin LSin
Input
Out
Port address
![Page 22: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/22.jpg)
Confidential
Commercial ViabilityCommercial Viability
X5 performance improvement over conventional solutions (mix of cost & power)
Conceptually simple Clearly defined target applications Simple systems connections Scaleable Support hardware & software standards
![Page 23: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/23.jpg)
Confidential
Conditional N Bit DPA CellConditional N Bit DPA Cell
ALU
DFF
Bit Shift
CarryLogic
Constbit
ALUCTL
mux0 mux1
mux2
A B
F
CinCout
LSin RSin
SFTCTLRA
M
ColS
el
ConditionLogic
EXT[1:0]
AddressLogic
Branch
Routing Matrix
Routing Matrix
Rou
ting M
atrix
Rou
ting M
atrix
Cout
Cin
RSin LSin
4 Bit Cell:180 Gates112 Bits RAM
![Page 24: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/24.jpg)
Confidential
N Bit Wide DPAN Bit Wide DPA
N bit wide FUStatusReg
A B
CCondition Logic
N bit wide FUCondition Logic
A B
C
N bit wide FUCondition Logic
A B
FU DecodeM PlaneRAM
StatusReg
FU DecodeM PlaneRAM
StatusReg
FU DecodeM PlaneRAM
Program
Storage
Program
Storage
Program
Storage
![Page 25: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/25.jpg)
Confidential
N Bit Wide PU BlockN Bit Wide PU Block
N bit wide ALUStatusReg Condition Logic
A B
I DecodeAddrLogic
InstRAM
N Bit wide Shift
NOTES/QUESTIONS- Inst has no const, but has offsets,- Inst RAM can be small. 64 words? - note counter takes 3 instructions.- How much subroutine support? None?- Simplified 16 bit or full 32 bit instructions.- 2 or 4 local area busses?- Synchronization issue: Master states accessible, Cond mask use.- Option to break or combine N bit DP elements?- Resource pool on busses? E.g... MULT?- Approx.. size of 32 bit FU 800u x 500u? - If so a 16x8 processor array is possible. - I.e.. 128 processors at 100MHz = 12800MIPS- Turn off till global state instruction for power reduction- Handling of interrupts (if at all) - Handle global signal interrupts how?- Multiple bit wide segmentation through masks? E.g... 2 counter in one PU?
Local RAM
Arbit
Arbit
StateH
ierBus
BusW
BusX
PipeBus
PipeBus
Status Msk Source A Source B Shift OpOP Code
Instruction Format
![Page 26: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/26.jpg)
Confidential
Potential ConfigurationPotential Configuration
128 32 Bit “Pico” Process Units 12800MIPS @ 100MHz 80mm2 in 0.35u CMOS Concept of hierarchical hardware
scope Very fast streaming operations Simple PU programming model Applications:
Video processing LAN Routing DSP Fast Prototyping
16 x 8 PU ARRAY
MUX/DMA/FIFO
RAMBUS Interface
Controller 256GlobalRam
![Page 27: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/27.jpg)
Confidential
PU Program EnvironmentPU Program Environment
Operands: BusW, BusX, Accumulator, HierBus, PipeBus, Local Ram. Use PU Typically runs a small program
– May be as little as two instructions
– 64 words of code maximum
Instruction types:Arithmetic, logicalData movingInterrupt
Function InstructionsArithmetic 1
Counter 1-2Mux 1
Multiply Accumulate 3FIFO Stage 3
Multiport Register 1Shift Register 2
![Page 28: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/28.jpg)
Confidential
Architecture Figures of MeritArchitecture Figures of Merit
Average density vs application specific cells
Speed of applications vs hardwired logic Percentage reuse
![Page 29: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/29.jpg)
Confidential
Next StepsNext Steps
VHDL Modeling of Architecture Primitive assembler tools for PUs Selection coding and simulation of
applications Architecture tuning Layout and verification of complete DPA
![Page 30: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/30.jpg)
Confidential
Design ToolsDesign Tools
Tanner:Schematic entry, logic simulation, custom layout,
layout verification.Circuit Simulation.PC & Sun platforms.MOSIS Libraries.
Mentor Graphics:VHDL compilation and simulation.
![Page 31: Dynamically Programmable Array Architecture](https://reader036.vdocument.in/reader036/viewer/2022062322/56815175550346895dbface9/html5/thumbnails/31.jpg)
Confidential
Basic FU RoutingBasic FU Routing
FU FU
FU FU
FU
FU
FU
FU
FU FUFU FU