an fpga-based accelerator for detailed maze routingnestorj/reports/fpga07_poster_pg.pdf · ni si xo...

L4 Organization} Key Idea: 2-D Array of Simple PEs} Control unit broadcasts commands to PEs} Reduces O(N2) expansion step to O(N)

} PE Connection Detail

} PE Implementation for Multilayer Routing} Layers processed from bottom to top} Expands all grids on current layer in parallel

} PE States (for each grid)

} PE Commands

An FPGA-Based Accelerator for Detailed Maze RoutingJohn A. Nestor and Jeremy LavineDepartment of Electrical and Computer Engineering

Lafayette CollegeEaston, Pennsylvania 18042

[email protected]

AbstractThis paper describes an FPGA-based accelerator for maze routingapplications such as integrated circuit detailed routing. The acceleratorefficiently supports multiple layers, multi-terminal nets, and rip up andreroute. By time-multiplexing multiple layers over a two-dimensional arrayof processing elements, this approach can support grids large enough forpractical detailed routing while providing at 1-2 orders of magnitudespeedup over software running on a modern desktop computer. Thecurrent implementation supports 32 X 32 routing grids with up to 16 layersin a single Xilinx XC2V6000 FPGA. Up to 64 X 64 routing grids arefeasible in larger commercially available FPGAs. Performancemeasurements (including interface overhead) show a speedup of 29X-93Xover software running on a 3.79GHz Pentium Xeon desktop computerdepending on the number of layers used. An improved interface designcould yield significantly larger speedups.

Implementation} 32 X 32 accelerator supports 1-16 layers} Host FPGA: Xilinx XC2V6000} Board: Dini 3000K10S w/ PCI Interface} Host: 1.8GHz Pentium 4 Linux PC

Implementation Results} “Adjacent” route (0,0,0 - 0,0,1)

} “Corner” route (0,0,0 - 31,31,L-1)

} 90 Random 2-Terminal Nets

} 40 Random Multi-Terminal Nets

Additional Features} Multi-terminal Net Routing

} Etching - Identification of Ripup Sets

Column Dec.

Row Dec.

ControlUnit

PE status out

PE command,status in

PCITarget

routing commands

routingresults

rs1

rs2

cs1 cs2

PE Array

FIFO

FIFO

HostComputer

EI WI

NI

SI

XO

XO

XO

XO SI

XO

WI XO

NI XO

EI XO

SEL CMD

STO 3

2

North Neighbor

South Neighbor

RSEL

CSEL

EastNeighbor

WestNeighbor

CMD

STATE OUTclk

PREF

pref

STI

STATE IN

Broadcast to all PEsfrom Control Unit

Logical AND of PE outputsto Control Unit

3

D QXL

XH/TOP

CLK

SEQUENCERST0

SREG

NSCSHI

LI NI SI EI WI

RSEL CSEL CMD

ST

to adjacent cells

fromdecoders CMD

STATUS

PFV

EN

1 0

PF

PFEWPFNS

EN

XO

D Q

ETCHETCH

T1

T3

T21

1 2

2

2

3

3

3

3

4

4

4

T1

T3

T2

T1T2

T1

T3

T21 1 1

1111

2 2 2 2T3

T1

T3

T2

T1

T3

T2

(a) (b) (c)

(f)(e)(d)

1S1

T1

T2

(a) (b) (c)

(f)(e)

S21

2

1

T2S21

2

S1

T1

3

3

1

T2S21

2

S1

T1

3

3

4

1

S21

2

S1

T1

3

3 4

T2S2S1

T1

S1

T1

T2S2

(d)

4T2

EMPTY Cell unoccupied and unexpandedBLOCKED Cell occupied by routed netXE Expanded - shortest backtrace path to east XW Expanded - shortest backtrace path to west XN Expanded - shortest backtrace path to north XS Expanded - shortest backtrace path to south XU Expanded - shortest backtrace path up XD Expanded - shortest backtrace path down

READ Return state of selected cell(s)WRITE Write state of selected cell(s)

EXPAND

IF EMPTY or (BLOCKED and ETCH enabled)!, AND a neighboring cell is expandedTHEN enter corresponding expand state (XN, …)

CLEARX

Reset expanded cells to EMPTY state Reset etched cells to BLOCKED state

Component Source LUTs FFsPCI Target (from Dini) 529 164 223CMD FIFO (Xilinx IP) IP 78 126Result FIFO (Xilinx IP) IP 78 126PE Array 749 64,838 11,268Column Decoder 94 211 32Row Decoder 94 211 32Control Unit 1,103 362 129FPGA Top Level 272 38 39TOTAL 2,841 65,980 11,975

Layers SW (µs) L4 (µs) Speedup6 Route 85,585.41 2,964.37 28.87

Ripup 80.64 1,931.47 0.04Comb. 85,666.05 4,895.84 17.49

8 Route 119,321.59 3,110.83 38.36Ripup 74.62 1,853.49 0.04Comb. 119,396.20 4,964.32 24.05

16 Route 253,031.40 3,545.12 71.37Ripup 67.95 1,516.95 0.04Comb. 253,099.35 5,062.07 49.99

Layers SW (µs) L4 (µs) Speedup6 Route 108,112.72 2,932.49 36.87

Ripup 70.77 1,760.93 0.04Comb. 108,183.49 4,693.42 23.05

8 Route 150,573.09 3,050.88 49.35Ripup 69.44 1,700.88 0.04Comb. 150,642.53 4,751.76 31.70

16 Route 324,452.70 3,468.57 93.54Ripup 61.58 1,406.43 0.04Comb. 324,514.28 4,875.00 66.56

Software measurements: Using P4 cycle counter on 3.79GHz P4 Xeon EM64TL4 measurements: Using P4 cycle counter on 1.8GHz P4 with Dini DN3000K10S PCI card / XC2V6000 FPGA

Layers SW (µs) L4 (µs) Speedup6 Route 62.90 12.29 5.12

Ripup 0.11 3.26 0.03Comb. 63.01 15.55 4.05

8 Route 82.49 12.35 6.68Ripup 0.12 3.21 0.04Comb. 82.61 15.56 5.31

16 Route 162.20 13.38 12.12Ripup 0.11 3.21 0.03Comb. 162.31 16.59 9.78

Layers SW (µs) L4 (µs) Speedup6 Route 2,170.52 32.58 66.62

Ripup 0.79 9.82 0.08Comb. 2,171.31 42.4 51.21

8 Route 2,905.91 37.71 77.06Ripup 0.79 9.62 0.08Comb. 2,906.70 47.33 61.41

16 Route 5,849.52 66.77 87.61Ripup 0.86 9.8 0.09Comb. 5,850.38 76.57 76.41

an fpga-based accelerator for detailed maze routingnestorj/reports/fpga07_poster_pg.pdf · ni si xo...

Documents