topology-aware buffer insertion and gpu-based massively parallel rerouting for eco timing...
TRANSCRIPT
![Page 1: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/1.jpg)
Topology-Aware Buffer Insertion and GPU-Based Massively Parallel
Rerouting for ECO Timing Optimization
Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao Liu, Yih-Lang Li
Department of Computer Science, NCTU
ASPDAC 2012
![Page 2: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/2.jpg)
Outline
• Introduction• Preliminaries• Problem formulation• Proposed algorithms• Experimental results• Conclusions
![Page 3: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/3.jpg)
Introduction
• Precise timing information for critical paths/sinks with delay violations is only available after P&R stage.– Re-design is time consuming.
• Engineering change orders (ECO) can be used to fix timing violations after P&R.– Using spare cells with re-routing.
![Page 4: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/4.jpg)
Introduction (cont.)• Conventional timing ECO algorithms focus
on improving the delay of one timing path at a time.– [3] considered one two-pin net in the timing path
but neglected the multi-pin net topology when selecting inserted buffers.
– [4] considered the positions of multiple pins of a net but did not consider the net topology of detailed routing paths.
• Only optimize the delay of the critical sink by treating one multi-pin net as one two-pin net may degrade the delays of other sinks of the same net.– Sequentially worsening other timing violation
paths.
![Page 5: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/5.jpg)
The effect of topology
![Page 6: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/6.jpg)
Introduction (cont.)
• Besides, detail routing is time consuming.– Greedily finding the inserted buffer and
connections may falling into suboptimal.– Sequentially investigating each
reconnection to the newly inserted buffer requires unacceptable detailed rerouting runtime.
• Parallel routing could save the runtime.– GPU supports high computing power with
low cost.
![Page 7: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/7.jpg)
Preliminaries
• ECO timing optimization– Inserts one buffer between two gates on
the timing path by breaking the original interconnection and re-wiring the gates to the inserted buffer.
• Estimate delay by Elmore delay model– model
![Page 8: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/8.jpg)
![Page 9: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/9.jpg)
Problem formulation
• Given– A routed design (D), a buffer set (B), a
routed net set (NALL), a routed net (N) belonging to NALL with an edge set (E), a pin set (P), a violation pin set (VP).
• Objective– Inserting one buffer in B into N, such that
the topology of N is changed and the arrival times of the sinks in VP are minimized without the addition of violated sinks.
Topology-Aware Buffer Insertion (BI) & Topology Restructuring
![Page 10: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/10.jpg)
Proposed algorithms
• Buffering Pair scoring (BP).• Edge breaking and Buffer
connection (EB).• Topology Restructuring (TR).• Node Computing-based Massively
Parallel Maze Routing (NCMPMR).
![Page 11: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/11.jpg)
Buffering pair scoring
• We want to disregard those BP that may potentially worsening the delay of some sinks.– In other words, invalid BPs are
ignored.• Then adopts the Elmore delay
model to compute the delay difference for all sinks in VP if a BP is valid.
• The wire length is estimated by the Manhattan distance.
![Page 12: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/12.jpg)
Buffering pair scoring (cont.)
• is the delay difference of VP after one buffering pair is selected for BI.
• is the delay difference of VP after topology restructuring around the inserted buffer.
• Weight of each violation sink• The score of each buffering pair
![Page 13: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/13.jpg)
Edge breaking and buffer connection
![Page 14: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/14.jpg)
Edge breaking and buffer connection (cont.)
• A buffering pair can improve the delays of all sinks in when the following inequality is satisfied.
• A buffering pair can improve the delays of all sinks in when the following inequality is satisfied.
1
2
34
5
1. Changed delay of
2. Delay of 3. Delay of
buffer4. Delay of 5. Previous
delay of
![Page 15: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/15.jpg)
Topology restructuring
• Since the net is partitioned by the inserted buffer, we could change the topology for further delay improvement.
• Two observations– Reconnecting to buff with the shortest
wires could improve the delays of all sinks in .
– Separating into two subnets and reconnecting them to buff with the shortest wires could improve the arrival times of all sinks in .
![Page 16: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/16.jpg)
The overall flow
![Page 17: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/17.jpg)
Node computing-based massively parallel maze routing.
Iteration 0
Routing cost
w[A,0] 0
w[B,0] Inf
w[C,0] Inf
w[D,0] Inf
Iteration 1
Routing cost
w[A,1] 0
w[B,1] 1
w[C,1] 5
w[D,1] Inf
Iteration 2
Routing cost
w[A,2] 0
w[B,2] 1
w[C,2] 5
w[D,2] 3
Iteration 3
Routing cost
w[A,3] 0
w[B,3] 1
w[C,3] 4
w[D,3] 3
![Page 18: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/18.jpg)
NCMPMR flow
![Page 19: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/19.jpg)
Speedup and preventing race condition
• Partition routing graph to blocks due to performance and scalability.
• Stagger adjacent blocks for better performance.– 2.25x faster.
![Page 20: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/20.jpg)
Experimental results
• Environment– AMD Opteron 2.6GHz workstation with
16GB memory.– Intel Xeon E5520 2.26GHz with 8GB
memory and a single NVIDIA Tesla C1060 GPU.
• Implemented in C++.• s35932 in IWLS benchmark with
additional 300 spare cells.• Selects five nets, N1-N5, in s35942
with various degrees of pins to demonstrate.
![Page 21: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/21.jpg)
Critical sink delay improvement
![Page 22: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/22.jpg)
Analysis
• The following results are on platform 2.
![Page 23: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao](https://reader037.vdocument.in/reader037/viewer/2022110205/56649cab5503460f9496c569/html5/thumbnails/23.jpg)
Conclusions
• This work develops topology-aware ECO timing optimization algorithm flow.– BP, EB, TR.– GPU based re-routing.
• Improve the WNS and TNS significantly with 7.72x average runtime speedup compared to conventional 2-pin net-based buffer insertion.