a r evisit to the p rimal -d ual b ased c lock s kew s cheduling a lgorithm min ni and seda ogrenci...

Post on 26-Mar-2015

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A REVISIT TO THE PRIMAL-DUAL BASED CLOCK SKEW SCHEDULING ALGORITHMMin Ni and Seda Ogrenci Memik

EECS Department, Northwestern University

AGENDA

Introduction Related Work The Primal-Dual Algorithm

The existing primal-dual approach Our enhanced implementation

Experimental Results Conclusion

INTRODUCTION

The Problem of Clock Skew Scheduling

constraint graph

ijji

jiji

tLL

PTLL

MINIMIZE P

RELATED WORK

Existing Approaches for Solving Clock Skew Scheduling Linear programming Binary search with iterative shortest path

problem O(|V||E|log(C/n))

Primal-dual based algorithm (Burns’) O(|V|^2|E|)

THE PRIMAL-DUAL APPROACH

Theory of the Primal-Dual Algorithm

hijji

sjiji

EjitLL

EjiPTLLts

PPRIMAL

),(,

),(,..

min

Complementary slackness theorem: starting from feasible solution of PRIMAL, find feasible solution of DUAL, they can be optimal if certain conditions are met.

Esjiij

Ehji Ehij Esji Esijjiijjiij

Esji Ejiijijijji

Vits

tTDUALh

),(

),( ),( ),( ),(

),( ),(

1

,0..

max

dual variables

Primal variables

PRIMAL-DUAL APPROACH

The complementary slackness conditions

0

0)1(

0)(

0)(

),( ),( ),( ),(

),(

Ehji Ehij Esji Esijjiijjiij

Esjiij

ijjiij

jijiij

P

tLL

PTLL

General format: variable times constraints

Starting from a feasible solution {Li, P}, if we can also find feasible solution { }to the above system of linear equations, the feasible solution is optimal.

ijij ,

If > 0, then must be zero, those = 0 are called admissible edge.

ijij ,

RESTRICTED DUAL PROBLEM

Solve the system of linear equations on only admissible edges

0

01

),( ),( ),( ),(

),(

Ehji Ehij Esji Esijjiijjiij

Esjiij

This is equivalent to solving the following restricted dual problem

0,0,0

0

1..

min

),( ),( ),( ),(

),(

jiij

Ehji Ehij Esji Esijjiijjiij

Esjiijts

If minimum is 0, then we are done. However, it is still not straightforward to solve because it is on dual variables

RESTRICTED PRIMAL PROBLEM

Check on the Restricted Primal Problem

1

0

0..

max

ji

ji

dd

ddts

It can be proved that this problem has an optimal solution 0 if there exists a cycle on the admissible graph Ga (consisting of admissible edges only).

PRIMAL-DUAL ALGORITHM

Starting from an empty admissible graph, incrementally reduce the clock period value until a cycle emerges in the admissible graph.

The effect of reducing P is that more edges become admissible and those are inserted into admissible graph Ga.

0

0

ijji

jiji

tLL

PTLL

Two main tasks in while loop: 1.Find THETA;2.Maintain Ga;

PRIMAL-DUAL BURNS’ IMPLEMENTATION

A different strategy for maintaining the admissible graph Ga and updating THETA values results in different efficiency.

AN EXAMPLE

5 iterations to find the minimum clock period P by updating admissible graph and theta value;

edge becomes admissible

Theta value

skew

ENHANCED IMPLEMENTATION

Two major sources of overhead in the existing implementationScan through all edges (|E|) in the graph

to create admissible graph Ga from scratch in each iteration;

Calculate theta values for all edges (|E|) in the graph and find the minimum one;

MAINTAINING ADMISSIBLE GRAPH Theorem: If exactly one minimum theta value edge (i,

j) is added into the admissible graph Ga, then Ga is a forest until a cycle is generated.

Add new admissible edge and remove edges becoming non-admissible;

No need for calling negative cycle detection routine, maintaining a parent list instead; Complexity is |V| compared with the same step in Burns’ implementation |E|;

EFFICIENT CALCULATION OF THETA

Similar to Dijkstra’s shortest path algorithm, a set of edges are maintained as candidates of shortest path tree edges;In our problem, we need to find minimum

theta edge to add into Ga; In Burns’ implementation, all edges are scanned

during each iteration; theta values are recalculated for all edges;

We maintain a much smaller set of candidates in heap; theta values are only recalculated for a subset of this small candidate set.

O(logV) for maintaining the heap;

ASYMPTOTIC RUNTIME IMPROVEMENT

Our implementation has an asymptotic runtime of ; while it is for Burns’ implementation; Very similar to the improvement from Bellman-

Ford algorithm ( )to Dijikstra’s ( ) algorithm for shortest path problem.

|)||(| 2 EVO

|)||(| EVO

|)|log|||||(| 2 VVEVO

|)|log|||(| VVEO

EXPERIMENTAL SETUP

Benchmark circuits ISCAS89 large circuit ITC99

Delay data Resynthesis in Synopsys Design Compiler (VHDL) Delay is exported from Standard Delay Format

(SDF) file Comparison between Burns’ and ours

Same graph data structure Same graph manipulating subroutines Same routine for calculating theta values

EXPERIMENTAL RESULTS

CONCLUSIONS

A much more efficient primal-dual based algorithm to improve the runtime efficiency of Burns’ implementation of the primal-dual algorithm

Superior in both theoretical and practical runtime efficiency

On average 95X speed up on 20 test circuits

top related