more scheduling resource sharing - cornell university · 2018-10-09 · resource sharing and...

Post on 29-Jan-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

More SchedulingResource Sharing

ECE 5775High-Level Digital Design Automation

Fall 2018

▸ Lab 3 is released (due Friday 10/5)– 10 FPGA boards available – Go through the CORDIC tutorial asap

1

Announcements

▸ More SDC scheduling

▸ Resource sharing overview– Sub-problems: functional unit, register, and connectivity binding

problems– Key concepts: compatibility and conflict graphs

2

Outline

▸ ILP for time-constrained scheduling minimize cTyx1,1 + x2,1 +x6,1 + x8,1 – y1 £ 0x6,2 + x7,2 + x8,2 – y1 £ 0 x7,3 + x8,3 – y1 £ 0x5,4 + x9,4 + x11,4 – y2 £ 0 …

3

Review: ILP Formulation for TCS

What is the y vector?

+´´´´

´ ´ + <

-

-

1

2

3

4

v2v1

v3

v4

v5

v6

v7

v8

v9

v10

v11

´

´´

´

´

+ <

-

-

1

2

3

4

v2v1

v3

v4

v5

v6

v7 v8

v9

v10

v11

4

Review: SDC-Based Scheduling

• Target cycle time: 5ns• Delay estimates

– Add (+) 1ns– Load (ld) 3ns– Store (st) 1ns

si : schedule variable for operation ild

+

ldld

+

v1

v3

v4

v2v0

1ns

3ns

1ns

§ Dependence constraints<v0 ,v4>:s0 – s4 ≤0<v1 ,v3>:s1 – s3 ≤0<v2 ,v3>:s2 – s3 ≤0<v3 ,v4>:s3 – s4 ≤0<v4 ,v5>:s4 – s5 ≤0

§ Cycle time constraintsv2à v5 :s2 – s5 ≤-1v1à v5 :s1 – s5 ≤-1

stv51ns

▸ A linear programming formulation based on system of integer difference constraints (SDC)

▸ Difference constraints can be conveniently represented using constraint graph– Each vertex represents a variable and each weighted edge

corresponds to a different constraint – Detect infeasibility by the presence of negative cycle (by solving

single-source shortest path)

5

SDC Constraint Graph

s0

s1

s2

s3s4

00

0

0

-1

s0 – s4 ≤0s1 – s3 ≤0s2 – s3 ≤0s3 – s4 ≤0s4 – s5 ≤0s2 – s4 ≤-1s1 – s4 ≤-1s4 – s2 ≤0

0-1

s2 – s4 ≤-1s5

0

-1

s2 – s4 ≤-1s4 – s2 ≤0

0≤-1

▸ Resource constraints cannot be represented exactly in integer difference form

6

Handling Resource Constraints (NP-Hard in General)

§ Resource constraintsè Heuristic partial orderings

v0à v2 :s0 – s2 ≤-1

OR

v1à v0 :s1 – s0 ≤-1v2à v0 :s2 – s0 ≤-1

3 cycle latency

2 cycle latency

ld

+

ldld

+

v1

v3

v4

v2v0

1ns

3ns

1ns

stv51ns

• Resource constraint– Two read ports

[J. Cong & Z. Zhang, DAC, 2006] [Z. Zhang & B. Liu, ICCAD, 2013]

7

Exact and Practically Scalable Scheduling with SDC and SAT (SDS)

Conflict based search

Graph basedfeasibility checkingPolynomial time~1M variables

>1M clauses

SATResource

Constraints

Partial orderings

Difference constraints

s0 – s4 ≤0s1 – s3 ≤0s2 – s3 ≤0s3 – s4 ≤0s4 – s5 ≤0s2 – s5 ≤-1s1 – s5 ≤-1

SDCTiming

Constraints

InfeasibilityConflict clauses

Conflict-driven learning

[S. Dai, G. Liu, and Z. Zhang, FPGA 2018]

▸ Given a Boolean function F(x1, x2, … xn), find an assignment to xi’s to make F evaluate to 1– If such assignment exists, F is satisifiable– Otherwise, F is unsatisfiable

▸ Example: (x + y + z) (x’ + y’ + z) (x’ + y’ + z’) – A satisfying assignment: x=1, y=0, z=1

▸ First NP-complete problem (Cook-Levin theorem)

▸ Numerous practical applications– Hardware/software verification (e.g., equivalence checking, model checking)– Artificial intelligence (e.g., planning, automated reasoning)– Automated theorem proving– Combinatorial design

8

Boolean Satisfiability Problem (SAT)

▸ SAT solvers have made significant progress in scalability– From toy problems with 100-200 variables (early 90s) – To industrial applications with 1M+ variables, 5M+ constraints

(2010s)

▸ Modern SAT solvers typically employ a backtracking-based search algorithm where conflict-driven clause learning is a key to efficiency

9

Scalability of SAT Solvers

[source: A. Sabharwal, Modern SAT Solvers:Key Advances and Applications, 2011]

Ruv : whether operation u is sharing the same resource with operation v

Ouàv : denotes whether operation u is scheduled earlier than v

Ordering constraints: Operations sharing the same resources must be scheduled apart

R01à (O0à1∨O1à0 )~(O0à1∧ O1à0 )R02à (O0à2∨O2à0 )~(O0à2∧ O2à0 )R12à (O1à2∨O2à1 )~(O1à2∧ O2à1 )

Note:R01à (O0à1∨O1à0 )meansR01implies(O0à1∨O1à0 )

10

Encoding Resource Constraints in SAT

SAT clauses for ordering three load operations

ld

+

ldld

+

v1

v3

v4

v2v0

stv5

Two read ports available

Partial orderings

Difference constraints

InfeasibilityConflict clauses

▸ Is 2-cycle schedule feasible?

Conflict-Driven Learning

1 -1 -1

-1propose

conflict

Negative cyclesum = -1

11

s0

s1

s2

s3s4

-1

s5

-1

What SAT learns from SDC:Any ordering involving operation 0 before 2 should no longer be attempted

▸ Is a 2-cycle schedule feasible?

Conflict-Driven Learning

1propose

conflict

Negative cyclesum = -2

12

s0

s1

s2

s3s4

-1

s5

-1

-1-1

-1

▸ Is a 2-cycle schedule feasible?

Conflict-Driven Learning

1propose

No conflict

No negative cycle

13

s0

s1

s2

s3s4

-1

s5

-1

-1-1

Feasible! Returns schedule.

▸ Generate short conflicts– Shorter conflict è more pruning è faster

convergence

– Negative cycle = irreducibly inconsistent set of constraints• Keeps conflicts short• Becomes consistent if any constraint is removed from the set

14

Fast Conflict-Driven Learning

▸ Combining SDC and SAT with conflict-driven learning enables fast yet exact resource-constrained scheduling– Up to 1000X faster than ILP

▸ Broader applications– Not just specific to HLS– Applies to constrained scheduling problems in

other fields

15

Take-Away Points on SDS Scheduling

High-level Programming Languages

(C/C++, SystemCMatlab, ...)

Compilation

Transformations

RTLgeneration

Recap: High-Level Synthesis Flow

S0

S1

S2

S0

S1

S2

ab

z

d

3 cycles

*–

Control data flow graph (CDFG)

Finite state machines with datapath

BB3

BB1

BB2

BB4

T F

+

-

*+

*

if (condition) {…

} else {t1 = a + b;t2 = c * d;t3 = e + f;t4 = t1 * t2;z = t4 – t3;

}

16

Scheduling Binding

Allocation

Resource Sharing and Binding

▸ Resource sharing: shares resources to minimize cost, in resource usage/area/power– Typically carried out by binding in high-level synthesis– Other subtasks such allocation and scheduling greatly impact

the resource sharing opportunities

▸ Binding: maps operations, variables, and/or data transfers to the available resources– After scheduling: decide resource usage and detailed

architecture (focus of this lecture)– Before scheduling: affect both area and delay – Simultaneous scheduling and binding: better result but more

expensive

17

▸ Functional unit (FU) binding– Primary objective is to minimize the number of FUs– Considers connection cost

▸ Register binding– Primary objective is to minimize the number of registers– Considers connection cost

▸ Connectivity binding– Minimize connections by exploiting the commutative property of

some operations / FUs– NP-hard

18

Binding Sub-problems

Sharing Conditions

▸ Functional units (registers) are shared by operations (variables) of same type whose lifetimes do not overlap – Lifetime: [birth-time, death-time)

• Operation: The whole execution time (if unpipelined)• Variable: From the time this variable is defined to the time it

is last used

19

20

Operation Binding

Functional Unit Operations

Mul1 op1, op3

AddSub1 op2, op4

AddSub2 op5, op6

clock edge

×

×

+

+ +−

2 31

ab

cdefg

op1op2

op3

op4

op5

op6

Functional Unit Operations

Mul1 op1, op3

AddSub1 op2, op4, op6

AddSub2 op5

Binding 1 Binding 2

21

Register Binding

clock edge

×

×

+

+ +

2 31

a

b

c

d

e

f

g

Lifetime crossing clock edge;Register Implied

22

Variable Lifetime Analysis

v1 [1, 2)v2 [2, 3)v3 [3, 4)

Variables v1, v2, and v3 can share the same register

Variable lifetimes

clock edge

×

×

+

+ +−

2 3 41

a

b

cdefg

v1v2

v3

▸ Operation/variables compatibility:– Same type, non-overlapping lifetimes

▸ Compatibility graph: – Vertices: operations/variables – Edges: compatibility relation

▸ Conflict graph: Complement of compatibility graph

23

Compatibility and Conflict Graphs

a b

c

d

a b

c

d

A scheduled DFG(operations have the same type) Compatibility graph

a b

c

d

Conflict graph

Note: The graphs for variables/registers can be constructed in a similar way

Clique Cover Number and Chromatic Number

▸ Compatibility graph:– Partition the graph into a minimum number of cliques

• Clique in an undirected graph is a subset of its vertices such that every two vertices in the subset are connected by an edge

▸ Conflict graph:– Color the vertices by a minimum number of colors (chromatic

number), where adjacent vertices cannot use the same color

24

a b

c

d

a b

c

d

A scheduled DFG Clique partitioning on compatibility graph

a b

c

d

Coloring on conflict graph

Operations have same type

▸ Next lecture: More Binding, Pipelining

25

Before Next Class

▸ These slides contain/adapt materials developed by– Prof. Deming Chen (UIUC)– Prof. Jason Cong (UCLA)

26

Acknowledgements

top related