programmable logic and moore'ssplconf.org/spl11/data/uploads/spl 2011 keynote_steve.pdf ·...
Post on 23-Apr-2020
2 Views
Preview:
TRANSCRIPT
Programmable Logic and Moore's
Two Laws
Steve Trimberger, Xilinx Fellow
Programmable Logic Directions
Page 2
Moore
Moore’s Law
“The number of transistors on an integrated
circuit doubles every 12 months.”
Page 3
Page 4
FPGA Capacity Trends
Largest Xilinx FPGA
Page 5
FPGA Performance Trends
Historical
FPGA data
Extrapolation
from ITRS
Page 6
FPGA Energy Trends (W / LC MHz)
Driven by lower
capacitance and
voltage
Page 7
The Future is
Digital
A fresh look at some history
Page 8
4 Ages 9
Gates
1985
10M
1M
10K
100K
1K
1989 1993 1997 2001
XC2000
XC3000
XC4000
Virtex
Virtex-IIApprox. 65% growthper year
• 1984-1991 Invention
• 1992-1999 Expansion
• 2000-2007 Accumulation
• 2008-
of 65
The “Ages” of FPGAs
4 Ages 10
Tight technology limits
Efficiency is key
Must innovate architecturally
FPGAs are much smaller than
the application problem size
FPGAs are “Glue Logic”
Design automation is secondary to capacity
Vendors must own tools
of 123
1985-1991 The Age of Invention
4 Ages 11
“The Architecture of the Month”
TC
IBMCLCAL
QL
ACT
CP
MPA
ERA
Virtex
Apex
XC4000
XC2000
XC3000
AT
6200
Am4000
ORCADL
XC5200FLEX
The Architectural Shakeout
Many devices disappeared in the mid
1990s
Xilinx: 8100, 6200, 4700, Prizm, …
Plessey, Toshiba, Motorola, IBM, ...
We were hit by fast-moving CMOS
process technology, particularly
multiple metal layers.
Of 123
4 Ages 12
Co
urt
esy a
nd
Co
pyri
ght
of
UM
C
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
1985 1990 1995 2000 2005
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
LUTs
Wire
Process Technology– Rapid scaling with cheap transistors and cheaper interconnect
– Ride the technology wave. Specialty processes limit scaling
Applications– FPGA size approaches the problem size
– Large reconfigurable devices enable communications and computation applications during internet “land grab”
Ease-of-Design Becomes Critical– Synthesis flow becomes possible, then dominates
– Interconnect-starved architectures die
1992-1999 The Age of Expansion
4 Ages 13
FPGAs Close on ASICs
1995 1996 1997 1998 1999 2001
Lo
g S
cale
Source: Synopsys, Gartner Dataquest, VLSI Technology, Xilinx
2003
FPGA Capacity
(48% CAGR)
125K160K
200K250K
300K375K600K
950K
1,500K
2,400K
3,800K
6,100KGates/cm2
Moore’s Law
(59% CAGR)
Average Cell-based
Design Start
(25% CAGR)
475K
9,700K
$10 FPGA
4 Ages 14
Gates
Routing
XC2000
Special
Arithmetic
Functions
XC4000
Memory
XC4000
High
Performance
I/O
Virtex
System
Clock
Management
Virtex
3.125 Gigabit
Transceiver
Virtex-II Pro
DSP
Virtex4Microprocessor
Virtex-II Pro Ethernet
MAC
Virtex4
2000-2007 The Age of Accumulation
4 Ages 15
Process Technology– Process and design complexity eliminates “casual” ASIC users
Applications– FPGAs are larger than the typical “problem size”
– We are implementing complete systems
– Standards are increasingly important
– Random logic capacity limited by I/O and memory bandwidth
– Power is a growing concern
– Post-bubble cost pressure (!)
Design effort takes on new dimensions– Not just glue logic anymore: systems issues come to the fore
– Complete digital systems on FPGAs require new design skills
2000-2007 The Age of Accumulation
Moore’s Second Law
The cost of a semiconductor fab doubles
every four years.Page 16
Moore’s Second Law
Page 18
http://www.kurzweilai.net
Capital equipment is more expensive
Mask tooling is more expensive
More subtle effects must be considered
Large Return Required to Justify the Expense
Page 19
Fewer Integrated Circuit Designs
Page 20
Page 22
The Future is
Programmable
Programmable Logic Directions
Page 23
More than Moore Moore
Transceiver Speed Expands Rapidly
Page 24
0
5
10
15
20
25
30
2000 2002 2004 2006 2008 2010 2012
Gb
ps
Programmable Logic Drives 3DStacked Silicon Interconnect
Page 25
Tens of thousands of
inter-die connections
Problems solved include
yield, reliability, heat
dissipation, signal
integrity
Invisible to users
Delivery: 2011
What
will
they
add
next?
What
will
they
add
next?
Return to the “Age of Accumulation?”
Gates
Routing
XC2000
Gates
Routing
XC2000
Dedicated
Arithmetic
Functions
XC4000
Dedicated
Arithmetic
Functions
XC4000
Memory
XC4000
Memory
XC4000
High
Performance
I/O
Virtex
High
Performance
I/O
Virtex
System
Clock
Management
Virtex
System
Clock
Management
Virtex
Transceivers
Virtex-II Pro
Transceivers
Virtex-II Pro
DSP
Virtex4
DSP
Virtex4Microprocessor
Virtex-II Pro
Microprocessor
Virtex-II Pro Ethernet
MAC
Virtex4
Ethernet
MAC
Virtex4ADC
Virtex-5
ADC
Virtex-5
Memory Ctl.
Spartan-6
Memory Ctl.
Spartan-6
Security
Virtex-2
Security
Virtex-2
Programmable Logic Directions
Page 27
More than Moore
More than More than Moore
Moore
Design Costs Grow Exponentially, Too
Page 28
http://www.design-reuse.com
4 Ages 29
Programmable Devices Must Address the Design
Gap
1995 1996 1997 1998 1999 2001
Lo
g S
cale
Source: Synopsys, Gartner Dataquest, VLSI Technology, Xilinx
2003
FPGA Capacity
(48% CAGR)
125K160K
200K250K
300K375K600K
950K
1,500K
2,400K
3,800K
6,100KGates/cm2
Moore’s Law
(59% CAGR)
Average Cell-based
Design Start
(25% CAGR)
475K
9,700K
$10 FPGA
Page 30
Efficient Design
Methodology is
Vital
4 Ages 31
High-level synthesis
HW / SW partition
TimingStandards and interfaces
Termination
Clock distribution
Noise Margin
Crosstalk
DFM IR drop
Repeaters
Startup init
Transmission lines
Clock generation
Giga-Scale Systems
Deep Sub-MicronEmbedded IP
NBTI
AlgorithmsRTL
0, 1 and delay
The Dual Challenges of VLSI
Test
4 Ages 32
High-level synthesis
RTL
0, 1 and delayHW / SW partition
TimingStandards and interfaces
Termination
Clock distribution
Noise Margin
Crosstalk
DFM IR drop
Repeaters
Startup init
Transmission lines
Clock generation
System Design
Platform FPGAEmbedded IP
At Xilinx, we do deep sub-micron design so you
don’t have to.
NBTI
Algorithms
The FPGA Boolean Abstraction
Test
4 Ages 33
Synthesis
RTL
HW / SW partition
Termination
Clock distribution
Noise Margin
Crosstalk
DFM IR drop
Repeaters
Startup init
Transmission lines
Clock generation
ASIC Design
Platform FPGAEmbedded IP
At Xilinx, we do LOGIC design so you don’t have to.
NBTI
Memory hierarchyProcesses CPI
Delay Throughput
Computer Design Network Design
FPGA SystemsEliminating the Boolean Abstraction
Test
Hardware and Software Programmability
Page 34
Zynq: Embedded Processing Platform
Dual ARM Cortex-A9 w/FPU,
L1&L2 Cache, 256KB
Memory, DDR2&DDR3,
ADC…
Up to 235K Programmable
Logic Cells, 400 I/Os,
10.3Gbps transceivers, PCI
Express
AXI between processor and
logic [More than 0,1, delay]
Processor controls FPGA
configuration
– Multiple security levels
supported
– Boot in secure or non-secure mode
– Download PL image via network, SD, USBPage 35
Zynq: Programming
Out-of-the-box SW programmable
– No FPGA design expertise required
Standard OS support
– Dual core ARM A9 base platform
Many Sources of SW and HW IP
– Standardized around AMBA-AXI
– Xilinx, ARM libraries
– 3rd Parties
Industry-Leading Tools
– ARM RVDS Suite & Ecosystem
– Open source GNU tools
– Xilinx ISE® Design Suite
– Xilinx Targeted Design Platforms
Programming
Integrate IP
Test
Release
Debug
Design
Xilinx IP
Partner IP
Custom IP
Integrate IP
Test
Release
Debug
Software
Architect
Hardware
Architect
System
Architect
Page 36
Anatomy of a Targeted Design Platform
Scalable development board
– Enable migration up or down in same FPGA package
– FMC connectors – extend base board functionality, enable
ecosystem
– Pre-configured with working Targeted Reference Design
Targeted Reference Designs
– Optimized for performance and lower resource utilization
– Enable system eval., performance measurement and analysis
Domain optimized design environment
– ISE Design Suite: Embedded Edition
• Hardware design flow and ebedded software development flow
• Advanced connectivity setup and analysis tools
• Support for industry standard AMBA 4 AXI4 interconnect
– Domain-specific tools
Documentation, source code, HW and SW IP cores
Slide 37
@400 MHz
DDR3
GT
P
GT
X
x4 G
en
2/
x8 G
en
1 P
CIe
v2
.0 E
nd
po
int
Co
re
PC
IeW
rap
pe
r
DMA
C2S
S2C
C2S
S2C
MIG
Wra
pp
er
XA
UI
XPS-LL
TEMAC
PC
Iex4/x
8 L
ink
S
Xilinx IP 3rd party IPIntegrated blocks Custom RTL
64-b
it T
RN
@ 2
50 M
Hz
User Space
Registers
Pkt
Control
+CRC
TRN PERFORMANCE
MONITOR
REGISTER
INTERFACE
MU
LT
I-P
OR
T V
IRT
UA
L F
IFO
XAUI
TX
XAUI
RX
RA
W D
AT
A
LO
OP
BA
CK
C2S_Data
C2S_Data
S2C_Data
S2C_Data
C2S_Ctrl
S2C_Ctrl
C2S_Ctrl
S2C_Ctrl
64
64
64
64
64
64
64
64
64 64
64
64
64 64
WR_Data
RD_Data
WR_Data
RD_Data
Control
Control
Control
Control
WR_Data
WR_Data
RD_Data
RD_Data
Control
Control
Control
Control
Data
Data
Control
Control
@250 MHz @250 MHz
@250 MHz
@156.25 MHz @156.25 MHz
Native Interface
of
MIG
64@200 MHz
256
256
Pkt
Control
+CRC
Control
Control
@250 MHz @250 MHz
@400 MHz
DDR3
GT
P
GT
X
x4 G
en
2/
x8 G
en
1 P
CIe
v2
.0 E
nd
po
int
Co
re
PC
IeW
rap
pe
r
DMA
C2S
S2C
C2S
S2C
MIG
Wra
pp
er
XA
UI
XPS-LL
TEMAC
PC
Iex4/x
8 L
ink
S
Xilinx IP 3rd party IPIntegrated blocks Custom RTL
64-b
it T
RN
@ 2
50 M
Hz
User Space
Registers
Pkt
Control
+CRC
TRN PERFORMANCE
MONITOR
REGISTER
INTERFACE
MU
LT
I-P
OR
T V
IRT
UA
L F
IFO
XAUI
TX
XAUI
RX
RA
W D
AT
A
LO
OP
BA
CK
C2S_Data
C2S_Data
S2C_Data
S2C_Data
C2S_Ctrl
S2C_Ctrl
C2S_Ctrl
S2C_Ctrl
64
64
64
64
64
64
64
64
64 64
64
64
64 64
WR_Data
RD_Data
WR_Data
RD_Data
Control
Control
Control
Control
WR_Data
WR_Data
RD_Data
RD_Data
Control
Control
Control
Control
Data
Data
Control
Control
@250 MHz @250 MHz
@250 MHz
@156.25 MHz @156.25 MHz
Native Interface
of
MIG
64@200 MHz
256
256
Pkt
Control
+CRC
Control
Control
@250 MHz @250 MHz
GT
P
GT
X
x4 G
en
2/
x8 G
en
1 P
CIe
v2
.0 E
nd
po
int
Co
re
PC
IeW
rap
pe
r
DMA
C2S
S2C
C2S
S2C
MIG
Wra
pp
er
XA
UI
XPS-LL
TEMAC
PC
Iex4/x
8 L
ink
S
Xilinx IP 3rd party IPIntegrated blocks Custom RTL
64-b
it T
RN
@ 2
50 M
Hz
User Space
Registers
Pkt
Control
+CRC
TRN PERFORMANCE
MONITOR
REGISTER
INTERFACE
MU
LT
I-P
OR
T V
IRT
UA
L F
IFO
XAUI
TX
XAUI
RX
RA
W D
AT
A
LO
OP
BA
CK
C2S_Data
C2S_Data
S2C_Data
S2C_Data
C2S_Ctrl
S2C_Ctrl
C2S_Ctrl
S2C_Ctrl
64
64
64
64
64
64
64
64
64 64
64
64
64 64
WR_Data
RD_Data
WR_Data
RD_Data
Control
Control
Control
Control
WR_Data
WR_Data
RD_Data
RD_Data
Control
Control
Control
Control
Data
Data
Control
Control
@250 MHz @250 MHz
@250 MHz
@156.25 MHz @156.25 MHz
Native Interface
of
MIG
64@200 MHz
256
256
Pkt
Control
+CRC
Control
Control
@250 MHz @250 MHz
AutoESL AutoPilot C to Gates
Page 38
In 2010, BDTI optical flow
benchmark showed quality of output
comparable to manual design.
“In our test of Man vs. Machine;
Machine won hands down! We were
able to create and verify complex
matrix inverse in 5 days vs. 3
months; Algorithm to FPGA speed &
QoR is unbelievable. If I did not
verify in hardware I would think the
tool is lying. “ —MilAero Company
Moore
Page 39
Moore’s Law is still delivering transistor count
– Performance limited by power
– Technology is increasingly difficult to master
Digital wins
Custom silicon getting prohibitively expensive
(“Moore’s Second Law”)
– Custom silicon too expensive for a rising fraction of
applications
Programmable wins
More than More than Moore
Page 40
Addressing the design gap
Devices
Tools
Boards
IP Libraries
Conclusions
void core (
int n, // input size
float *data_in1, // input stream
float *data_in2, // input stream
float *data_out // output stream
) {
int i, j=0;
for (i=0; i<n; i++)
data_out[i] = data_in1[i] + data_in2[i];
}
Programmable logic vendors are the
technology leaders
– [M] Shipping 28nm Technology
– [MtM] 3D Technology
– [MtMtM] Design efficiency: devices, IP, SW
Thank You
top related