programmable logic and moore'ssplconf.org/spl11/data/uploads/spl 2011 keynote_steve.pdf ·...

Post on 23-Apr-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Programmable Logic and Moore's

Two Laws

Steve Trimberger, Xilinx Fellow

Programmable Logic Directions

Page 2

Moore

Moore’s Law

“The number of transistors on an integrated

circuit doubles every 12 months.”

Page 3

Page 4

FPGA Capacity Trends

Largest Xilinx FPGA

Page 5

FPGA Performance Trends

Historical

FPGA data

Extrapolation

from ITRS

Page 6

FPGA Energy Trends (W / LC MHz)

Driven by lower

capacitance and

voltage

Page 7

The Future is

Digital

A fresh look at some history

Page 8

4 Ages 9

Gates

1985

10M

1M

10K

100K

1K

1989 1993 1997 2001

XC2000

XC3000

XC4000

Virtex

Virtex-IIApprox. 65% growthper year

• 1984-1991 Invention

• 1992-1999 Expansion

• 2000-2007 Accumulation

• 2008-

of 65

The “Ages” of FPGAs

4 Ages 10

Tight technology limits

Efficiency is key

Must innovate architecturally

FPGAs are much smaller than

the application problem size

FPGAs are “Glue Logic”

Design automation is secondary to capacity

Vendors must own tools

of 123

1985-1991 The Age of Invention

4 Ages 11

“The Architecture of the Month”

TC

IBMCLCAL

QL

ACT

CP

MPA

ERA

Virtex

Apex

XC4000

XC2000

XC3000

AT

6200

Am4000

ORCADL

XC5200FLEX

The Architectural Shakeout

Many devices disappeared in the mid

1990s

Xilinx: 8100, 6200, 4700, Prizm, …

Plessey, Toshiba, Motorola, IBM, ...

We were hit by fast-moving CMOS

process technology, particularly

multiple metal layers.

Of 123

4 Ages 12

Co

urt

esy a

nd

Co

pyri

ght

of

UM

C

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

1985 1990 1995 2000 2005

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

LUTs

Wire

Process Technology– Rapid scaling with cheap transistors and cheaper interconnect

– Ride the technology wave. Specialty processes limit scaling

Applications– FPGA size approaches the problem size

– Large reconfigurable devices enable communications and computation applications during internet “land grab”

Ease-of-Design Becomes Critical– Synthesis flow becomes possible, then dominates

– Interconnect-starved architectures die

1992-1999 The Age of Expansion

4 Ages 13

FPGAs Close on ASICs

1995 1996 1997 1998 1999 2001

Lo

g S

cale

Source: Synopsys, Gartner Dataquest, VLSI Technology, Xilinx

2003

FPGA Capacity

(48% CAGR)

125K160K

200K250K

300K375K600K

950K

1,500K

2,400K

3,800K

6,100KGates/cm2

Moore’s Law

(59% CAGR)

Average Cell-based

Design Start

(25% CAGR)

475K

9,700K

$10 FPGA

4 Ages 14

Gates

Routing

XC2000

Special

Arithmetic

Functions

XC4000

Memory

XC4000

High

Performance

I/O

Virtex

System

Clock

Management

Virtex

3.125 Gigabit

Transceiver

Virtex-II Pro

DSP

Virtex4Microprocessor

Virtex-II Pro Ethernet

MAC

Virtex4

2000-2007 The Age of Accumulation

4 Ages 15

Process Technology– Process and design complexity eliminates “casual” ASIC users

Applications– FPGAs are larger than the typical “problem size”

– We are implementing complete systems

– Standards are increasingly important

– Random logic capacity limited by I/O and memory bandwidth

– Power is a growing concern

– Post-bubble cost pressure (!)

Design effort takes on new dimensions– Not just glue logic anymore: systems issues come to the fore

– Complete digital systems on FPGAs require new design skills

2000-2007 The Age of Accumulation

Moore’s Second Law

The cost of a semiconductor fab doubles

every four years.Page 16

Moore’s Second Law

Page 17

http://www.kurzweilai.net

Moore’s Second Law

Page 18

http://www.kurzweilai.net

Capital equipment is more expensive

Mask tooling is more expensive

More subtle effects must be considered

Large Return Required to Justify the Expense

Page 19

Fewer Integrated Circuit Designs

Page 20

Do Transistors Really Become Free?

Page 21

http://zone.ni.com

Page 22

The Future is

Programmable

Programmable Logic Directions

Page 23

More than Moore Moore

Transceiver Speed Expands Rapidly

Page 24

0

5

10

15

20

25

30

2000 2002 2004 2006 2008 2010 2012

Gb

ps

Programmable Logic Drives 3DStacked Silicon Interconnect

Page 25

Tens of thousands of

inter-die connections

Problems solved include

yield, reliability, heat

dissipation, signal

integrity

Invisible to users

Delivery: 2011

What

will

they

add

next?

What

will

they

add

next?

Return to the “Age of Accumulation?”

Gates

Routing

XC2000

Gates

Routing

XC2000

Dedicated

Arithmetic

Functions

XC4000

Dedicated

Arithmetic

Functions

XC4000

Memory

XC4000

Memory

XC4000

High

Performance

I/O

Virtex

High

Performance

I/O

Virtex

System

Clock

Management

Virtex

System

Clock

Management

Virtex

Transceivers

Virtex-II Pro

Transceivers

Virtex-II Pro

DSP

Virtex4

DSP

Virtex4Microprocessor

Virtex-II Pro

Microprocessor

Virtex-II Pro Ethernet

MAC

Virtex4

Ethernet

MAC

Virtex4ADC

Virtex-5

ADC

Virtex-5

Memory Ctl.

Spartan-6

Memory Ctl.

Spartan-6

Security

Virtex-2

Security

Virtex-2

Programmable Logic Directions

Page 27

More than Moore

More than More than Moore

Moore

Design Costs Grow Exponentially, Too

Page 28

http://www.design-reuse.com

4 Ages 29

Programmable Devices Must Address the Design

Gap

1995 1996 1997 1998 1999 2001

Lo

g S

cale

Source: Synopsys, Gartner Dataquest, VLSI Technology, Xilinx

2003

FPGA Capacity

(48% CAGR)

125K160K

200K250K

300K375K600K

950K

1,500K

2,400K

3,800K

6,100KGates/cm2

Moore’s Law

(59% CAGR)

Average Cell-based

Design Start

(25% CAGR)

475K

9,700K

$10 FPGA

Page 30

Efficient Design

Methodology is

Vital

4 Ages 31

High-level synthesis

HW / SW partition

TimingStandards and interfaces

Termination

Clock distribution

Noise Margin

Crosstalk

DFM IR drop

Repeaters

Startup init

Transmission lines

Clock generation

Giga-Scale Systems

Deep Sub-MicronEmbedded IP

NBTI

AlgorithmsRTL

0, 1 and delay

The Dual Challenges of VLSI

Test

4 Ages 32

High-level synthesis

RTL

0, 1 and delayHW / SW partition

TimingStandards and interfaces

Termination

Clock distribution

Noise Margin

Crosstalk

DFM IR drop

Repeaters

Startup init

Transmission lines

Clock generation

System Design

Platform FPGAEmbedded IP

At Xilinx, we do deep sub-micron design so you

don’t have to.

NBTI

Algorithms

The FPGA Boolean Abstraction

Test

4 Ages 33

Synthesis

RTL

HW / SW partition

Termination

Clock distribution

Noise Margin

Crosstalk

DFM IR drop

Repeaters

Startup init

Transmission lines

Clock generation

ASIC Design

Platform FPGAEmbedded IP

At Xilinx, we do LOGIC design so you don’t have to.

NBTI

Memory hierarchyProcesses CPI

Delay Throughput

Computer Design Network Design

FPGA SystemsEliminating the Boolean Abstraction

Test

Hardware and Software Programmability

Page 34

Zynq: Embedded Processing Platform

Dual ARM Cortex-A9 w/FPU,

L1&L2 Cache, 256KB

Memory, DDR2&DDR3,

ADC…

Up to 235K Programmable

Logic Cells, 400 I/Os,

10.3Gbps transceivers, PCI

Express

AXI between processor and

logic [More than 0,1, delay]

Processor controls FPGA

configuration

– Multiple security levels

supported

– Boot in secure or non-secure mode

– Download PL image via network, SD, USBPage 35

Zynq: Programming

Out-of-the-box SW programmable

– No FPGA design expertise required

Standard OS support

– Dual core ARM A9 base platform

Many Sources of SW and HW IP

– Standardized around AMBA-AXI

– Xilinx, ARM libraries

– 3rd Parties

Industry-Leading Tools

– ARM RVDS Suite & Ecosystem

– Open source GNU tools

– Xilinx ISE® Design Suite

– Xilinx Targeted Design Platforms

Programming

Integrate IP

Test

Release

Debug

Design

Xilinx IP

Partner IP

Custom IP

Integrate IP

Test

Release

Debug

Software

Architect

Hardware

Architect

System

Architect

Page 36

Anatomy of a Targeted Design Platform

Scalable development board

– Enable migration up or down in same FPGA package

– FMC connectors – extend base board functionality, enable

ecosystem

– Pre-configured with working Targeted Reference Design

Targeted Reference Designs

– Optimized for performance and lower resource utilization

– Enable system eval., performance measurement and analysis

Domain optimized design environment

– ISE Design Suite: Embedded Edition

• Hardware design flow and ebedded software development flow

• Advanced connectivity setup and analysis tools

• Support for industry standard AMBA 4 AXI4 interconnect

– Domain-specific tools

Documentation, source code, HW and SW IP cores

Slide 37

@400 MHz

DDR3

GT

P

GT

X

x4 G

en

2/

x8 G

en

1 P

CIe

v2

.0 E

nd

po

int

Co

re

PC

IeW

rap

pe

r

DMA

C2S

S2C

C2S

S2C

MIG

Wra

pp

er

XA

UI

XPS-LL

TEMAC

PC

Iex4/x

8 L

ink

S

Xilinx IP 3rd party IPIntegrated blocks Custom RTL

64-b

it T

RN

@ 2

50 M

Hz

User Space

Registers

Pkt

Control

+CRC

TRN PERFORMANCE

MONITOR

REGISTER

INTERFACE

MU

LT

I-P

OR

T V

IRT

UA

L F

IFO

XAUI

TX

XAUI

RX

RA

W D

AT

A

LO

OP

BA

CK

C2S_Data

C2S_Data

S2C_Data

S2C_Data

C2S_Ctrl

S2C_Ctrl

C2S_Ctrl

S2C_Ctrl

64

64

64

64

64

64

64

64

64 64

64

64

64 64

WR_Data

RD_Data

WR_Data

RD_Data

Control

Control

Control

Control

WR_Data

WR_Data

RD_Data

RD_Data

Control

Control

Control

Control

Data

Data

Control

Control

@250 MHz @250 MHz

@250 MHz

@156.25 MHz @156.25 MHz

Native Interface

of

MIG

64@200 MHz

256

256

Pkt

Control

+CRC

Control

Control

@250 MHz @250 MHz

@400 MHz

DDR3

GT

P

GT

X

x4 G

en

2/

x8 G

en

1 P

CIe

v2

.0 E

nd

po

int

Co

re

PC

IeW

rap

pe

r

DMA

C2S

S2C

C2S

S2C

MIG

Wra

pp

er

XA

UI

XPS-LL

TEMAC

PC

Iex4/x

8 L

ink

S

Xilinx IP 3rd party IPIntegrated blocks Custom RTL

64-b

it T

RN

@ 2

50 M

Hz

User Space

Registers

Pkt

Control

+CRC

TRN PERFORMANCE

MONITOR

REGISTER

INTERFACE

MU

LT

I-P

OR

T V

IRT

UA

L F

IFO

XAUI

TX

XAUI

RX

RA

W D

AT

A

LO

OP

BA

CK

C2S_Data

C2S_Data

S2C_Data

S2C_Data

C2S_Ctrl

S2C_Ctrl

C2S_Ctrl

S2C_Ctrl

64

64

64

64

64

64

64

64

64 64

64

64

64 64

WR_Data

RD_Data

WR_Data

RD_Data

Control

Control

Control

Control

WR_Data

WR_Data

RD_Data

RD_Data

Control

Control

Control

Control

Data

Data

Control

Control

@250 MHz @250 MHz

@250 MHz

@156.25 MHz @156.25 MHz

Native Interface

of

MIG

64@200 MHz

256

256

Pkt

Control

+CRC

Control

Control

@250 MHz @250 MHz

GT

P

GT

X

x4 G

en

2/

x8 G

en

1 P

CIe

v2

.0 E

nd

po

int

Co

re

PC

IeW

rap

pe

r

DMA

C2S

S2C

C2S

S2C

MIG

Wra

pp

er

XA

UI

XPS-LL

TEMAC

PC

Iex4/x

8 L

ink

S

Xilinx IP 3rd party IPIntegrated blocks Custom RTL

64-b

it T

RN

@ 2

50 M

Hz

User Space

Registers

Pkt

Control

+CRC

TRN PERFORMANCE

MONITOR

REGISTER

INTERFACE

MU

LT

I-P

OR

T V

IRT

UA

L F

IFO

XAUI

TX

XAUI

RX

RA

W D

AT

A

LO

OP

BA

CK

C2S_Data

C2S_Data

S2C_Data

S2C_Data

C2S_Ctrl

S2C_Ctrl

C2S_Ctrl

S2C_Ctrl

64

64

64

64

64

64

64

64

64 64

64

64

64 64

WR_Data

RD_Data

WR_Data

RD_Data

Control

Control

Control

Control

WR_Data

WR_Data

RD_Data

RD_Data

Control

Control

Control

Control

Data

Data

Control

Control

@250 MHz @250 MHz

@250 MHz

@156.25 MHz @156.25 MHz

Native Interface

of

MIG

64@200 MHz

256

256

Pkt

Control

+CRC

Control

Control

@250 MHz @250 MHz

AutoESL AutoPilot C to Gates

Page 38

In 2010, BDTI optical flow

benchmark showed quality of output

comparable to manual design.

“In our test of Man vs. Machine;

Machine won hands down! We were

able to create and verify complex

matrix inverse in 5 days vs. 3

months; Algorithm to FPGA speed &

QoR is unbelievable. If I did not

verify in hardware I would think the

tool is lying. “ —MilAero Company

Moore

Page 39

Moore’s Law is still delivering transistor count

– Performance limited by power

– Technology is increasingly difficult to master

Digital wins

Custom silicon getting prohibitively expensive

(“Moore’s Second Law”)

– Custom silicon too expensive for a rising fraction of

applications

Programmable wins

More than More than Moore

Page 40

Addressing the design gap

Devices

Tools

Boards

IP Libraries

Conclusions

void core (

int n, // input size

float *data_in1, // input stream

float *data_in2, // input stream

float *data_out // output stream

) {

int i, j=0;

for (i=0; i<n; i++)

data_out[i] = data_in1[i] + data_in2[i];

}

Programmable logic vendors are the

technology leaders

– [M] Shipping 28nm Technology

– [MtM] 3D Technology

– [MtMtM] Design efficiency: devices, IP, SW

Thank You

top related