an fpga-based accelerator for detailed maze routingnestorj/reports/fpga07_poster_pg.pdf · ni si xo...

L4 Organization} Key Idea: 2-D Array of Simple PEs} Control unit broadcasts commands to PEs} Reduces O(N2) expansion step to O(N)

} PE Connection Detail

} PE Implementation for Multilayer Routing} Layers processed from bottom to top} Expands all grids on current layer in parallel

} PE States (for each grid)

} PE Commands

An FPGA-Based Accelerator for Detailed Maze RoutingJohn A. Nestor and Jeremy LavineDepartment of Electrical and Computer Engineering

Lafayette CollegeEaston, Pennsylvania 18042

nestorj@lafayette.edu

AbstractThis paper describes an FPGA-based accelerator for maze routingapplications such as integrated circuit detailed routing. The acceleratorefficiently supports multiple layers, multi-terminal nets, and rip up andreroute. By time-multiplexing multiple layers over a two-dimensional arrayof processing elements, this approach can support grids large enough forpractical detailed routing while providing at 1-2 orders of magnitudespeedup over software running on a modern desktop computer. Thecurrent implementation supports 32 X 32 routing grids with up to 16 layersin a single Xilinx XC2V6000 FPGA. Up to 64 X 64 routing grids arefeasible in larger commercially available FPGAs. Performancemeasurements (including interface overhead) show a speedup of 29X-93Xover software running on a 3.79GHz Pentium Xeon desktop computerdepending on the number of layers used. An improved interface designcould yield significantly larger speedups.

Implementation} 32 X 32 accelerator supports 1-16 layers} Host FPGA: Xilinx XC2V6000} Board: Dini 3000K10S w/ PCI Interface} Host: 1.8GHz Pentium 4 Linux PC

Implementation Results} “Adjacent” route (0,0,0 - 0,0,1)

} “Corner” route (0,0,0 - 31,31,L-1)

} 90 Random 2-Terminal Nets

} 40 Random Multi-Terminal Nets

Additional Features} Multi-terminal Net Routing

} Etching - Identification of Ripup Sets

Column Dec.

Row Dec.

ControlUnit

PE status out

PE command,status in

PCITarget

routing commands

routingresults

cs1 cs2

PE Array

HostComputer

SEL CMD

North Neighbor

South Neighbor

EastNeighbor

WestNeighbor

STATE OUTclk

STATE IN

Broadcast to all PEsfrom Control Unit

Logical AND of PE outputsto Control Unit

XH/TOP

SEQUENCERST0

NSCSHI

LI NI SI EI WI

RSEL CSEL CMD

to adjacent cells

fromdecoders CMD

STATUS

PFEWPFNS

ETCHETCH

T21 1 1

2 2 2 2T3

(a) (b) (c)

(f)(e)(d)

(a) (b) (c)

(f)(e)

T2S2S1

EMPTY Cell unoccupied and unexpandedBLOCKED Cell occupied by routed netXE Expanded - shortest backtrace path to east XW Expanded - shortest backtrace path to west XN Expanded - shortest backtrace path to north XS Expanded - shortest backtrace path to south XU Expanded - shortest backtrace path up XD Expanded - shortest backtrace path down

READ Return state of selected cell(s)WRITE Write state of selected cell(s)

EXPAND

IF EMPTY or (BLOCKED and ETCH enabled)!, AND a neighboring cell is expandedTHEN enter corresponding expand state (XN, …)

CLEARX

Reset expanded cells to EMPTY state Reset etched cells to BLOCKED state

Component Source LUTs FFsPCI Target (from Dini) 529 164 223CMD FIFO (Xilinx IP) IP 78 126Result FIFO (Xilinx IP) IP 78 126PE Array 749 64,838 11,268Column Decoder 94 211 32Row Decoder 94 211 32Control Unit 1,103 362 129FPGA Top Level 272 38 39TOTAL 2,841 65,980 11,975

Layers SW (µs) L4 (µs) Speedup6 Route 85,585.41 2,964.37 28.87

Ripup 80.64 1,931.47 0.04Comb. 85,666.05 4,895.84 17.49

8 Route 119,321.59 3,110.83 38.36Ripup 74.62 1,853.49 0.04Comb. 119,396.20 4,964.32 24.05

16 Route 253,031.40 3,545.12 71.37Ripup 67.95 1,516.95 0.04Comb. 253,099.35 5,062.07 49.99

Layers SW (µs) L4 (µs) Speedup6 Route 108,112.72 2,932.49 36.87

Ripup 70.77 1,760.93 0.04Comb. 108,183.49 4,693.42 23.05

8 Route 150,573.09 3,050.88 49.35Ripup 69.44 1,700.88 0.04Comb. 150,642.53 4,751.76 31.70

16 Route 324,452.70 3,468.57 93.54Ripup 61.58 1,406.43 0.04Comb. 324,514.28 4,875.00 66.56

Software measurements: Using P4 cycle counter on 3.79GHz P4 Xeon EM64TL4 measurements: Using P4 cycle counter on 1.8GHz P4 with Dini DN3000K10S PCI card / XC2V6000 FPGA

Layers SW (µs) L4 (µs) Speedup6 Route 62.90 12.29 5.12

Ripup 0.11 3.26 0.03Comb. 63.01 15.55 4.05

8 Route 82.49 12.35 6.68Ripup 0.12 3.21 0.04Comb. 82.61 15.56 5.31

16 Route 162.20 13.38 12.12Ripup 0.11 3.21 0.03Comb. 162.31 16.59 9.78

Layers SW (µs) L4 (µs) Speedup6 Route 2,170.52 32.58 66.62

Ripup 0.79 9.82 0.08Comb. 2,171.31 42.4 51.21

8 Route 2,905.91 37.71 77.06Ripup 0.79 9.62 0.08Comb. 2,906.70 47.33 61.41

16 Route 5,849.52 66.77 87.61Ripup 0.86 9.8 0.09Comb. 5,850.38 76.57 76.41

an fpga-based accelerator for detailed maze routingnestorj/reports/fpga07_poster_pg.pdf · ni si xo...

Documents

fpga ta ‑fifo‑ ‑ iot architectur...fifo memory fifo...

fifo intel® fpga ip user guide · 2020. 12. 27. · fifo...

altera fpga fifo master programming guide - ftdi altera fpga...

elastic fifo

an 165 establishing synchronous 245 fifo … 245 fifo,...

fifo verif plan

al460 full hd fifo memory datasheet - briefal460 hd fifo...

asynchronous fifo v6.1

· usiÅs o g' u 1b.10puor uvîuop !ensos oŒsviŒè1voona...

- proprietatile de transport ale srtio3 1-x x 3 si srti1-x...

fifo i lifo

bmi08x fifo usage - bosch global...defined level. fifo-full...

project fifo - architecture

1 consensus hierarchy part 2. 2 fifo (queue) fifo object...

uart with fifo buffer - altera with fifo buffer walkthrough...

uart with fifo buffer - altera · pdf fileuart with fifo...

fifo buffers embedded in st mems sensorslevel, fifo overrun...

ug uart fifo

fifo basics (usb2.0)

fifo design