Transcript
Page 1: Device and architecture co-optimization – Large search space – Need fast yet accurate power and delay estimator for FPGAs  Trace-based power and delay

Device and architecture co-optimization

ndashLarge search space

ndashNeed fast yet accurate power and delay estimator for FPGAs

Trace-based power and delay estimator (Ptrace)

Optimization result

ndashReduce energy delay product by 184 and area by 23

ndashLUT size 5 provides the maximum power and delay combined yield

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

10

11

12

ILeakN

wafer 24 Layout 1

-103

-102

-101

-1

-099

-098

-097

-096

-095

-094

Target function driven component analysis (FCA)

ndashGiven a target function f(X1 X2)

ndashFind out the linear decomposition matrix W to minimize the error mean variance and skewness of f() when ignoring high order dependencendashFCA has the same complexity as PCA and ICA but more accurate

Approximate max operation using second order polynomial

Works for all three delay models

More efficient and accurate than that using Fourier series

ndash20X faster than that using Fourier Series

ndashComputational complexity O(n3) for quadratic delay model O(n) for others

Within 2 error compared to MC simulation

Max operation using Fourier series approximation

ndashApproximate PDF of variation sources by Fourier Series

ndashApply moment matching to reconstruct the canonical form of max operation

All operations are based on either close form formulae or lookup table

ndashComputational complexity O(nK2)

Only works for linear and semi-quadratic delay model

Within 5 error compared to MC simulation

For the CMOS technology scaling process variation has become a potential show-stopper if not appropriately handled These variations introduce significant uncertainty for both circuit performance and leakage power Statistical modeling analysis and optimization for VLSI circuits has thus become the frontier research topic in recent years in combating such variation effects

As the process advances to nanometer technologies and low-energy embedded applications are explored for FPGAs power consumption becomes a crucial design constraint for FPGAs It is well known architecture and device setting have great impact on FPGA power and performance However how to perform statistical optimization considering both device and architecture has not been solved by previous works In addition some reliability issues such device aging and soft error rate (SER) may affect the performance of FPGAs Such impact was not considered in the previous works either

Besides FPGAs statistical modeling and analysis for ASICs are also hot research topics There are many works on statistical timing and power modeling and analysis However how to efficiently perform statistical static timing analysis (SSTA) for non-linear delay model with non-Gaussian variation sources is still a hard problem

Moreover most of statistical analysis assumes independent variation sources and apply principle component analysis (PCA) or independent component analysis (ICA) to decompose dependent variation sources However some of the variation sources are non-linearly dependent such as Leff and Vth In this case the linear operation (such as PCA or ICA) cannot completely remove dependence How to handle the non-linear dependent variation sources is another unsolved problem

Spatial correlation is another concern in statistical analysis Many recent works try to model spatial correlation as a function of distance However some recent research work observe that the spatial correlation mainly comes from the deterministic across wafer variation and the pure random spatial variation is not significant Modeling across wafer variation is also a challenge problem

PhDrsquo09 Statistical Modeling and Optimization for VLSI Circuits

Student Lerong Cheng (lerongeeuclaedu) Advisor Lei He Co-advisor Puneet GuptaEDA Lab (httpedaeeuclaedu) Electrical Engineering Department UCLA

L Cheng P Wong F Li Y Lin and L He ldquoDevice and Architecture Co-Optimization for FPGA Power Reductionrdquo DAC 2005PWong L Cheng Y Lin and L He ldquoFPGA Device and Architecture Evaluation Considering Process Variationrdquo ICCAD 2005L Cheng J Xiong and L He ldquoFPGA Performance Optimization via Chipwise Placement Considering Process Variationsrdquo FPL 2006L Cheng J Xiong and L He Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources DAC 2007

L Cheng J Xiong and L He ldquoNon-Gaussian Statistical Timing Analysis Using Second-Order Polynomial Fittingrdquo ASPDAC 2008L Cheng YLin L He and Y Cao ldquoTrace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliabilityrdquo ISFPGA 2008L Cheng P Gupta and L He ldquoAccounting for Non-linear Dependence Using Function Driven Component Analysisrdquo ASPDAC 2009

bullCollaborators Dr Jinjun Xiong Dr Yan Lin Dr Fei Li and Miss Phoebe Wong

Introduction

Block-based SSTA operationsndashAdd (simple)ndashMax (hard)

Core operation of SSTA

Delay modelndashLinear efficient but not accurate

ndashQuadratic accurate but not efficient

ndashQuadratic without crossing term (semi-quadratic) efficient and somewhat accurate

Analysis of Non-Linear Dependence

Statistical Modeling and Optimization for FPGAs

References amp Collaborators

Statistical Static Timing Analysis

Modeling of Across-Wafer Variation

Switching activityRatio of short circuit

powerCritical path structure

Circuit element statistics Area

Chip level area

delay and power

Circuit level delay and power

VPRPsim

Trace collection

Device dependent

Device independent

FPGA chipwise placement for timing optimization Concurrent design of process and FPGA architecture

ndashDevelop process and architecture concurrently in order to shorten the time to market

ndashNeed to estimate FPGA power and delay from process parameters

Ptrace2

ndashBased on ITRS Mastar4 transistor model

Analysis result

ndashDevice aging leads to 85 delay degradation after 10 years

ndashNeither device aging nor process variation has impact on SER

ndash Programmability of FPGAs offer a unique opportunity to leverage process variation and improve circuit performance

ndashPerform placement according to the chipwise variation maps

ndashImprove performance up to 121

Across-wafer variation can be approximated as quadratic function

After subtracting the across-wafer variation purely random spatial correlation is not significant

In the die point of view the within wafer is spatially correlated

ndashThis variation is not purely random

ndashCannot be modeled as random correlated variation

New Variation Model

ndashExactly model the across wafer variation

ndashOnly 4 random variables Xc Yc mw and r

ndashMore accurate and efficient than spatial l variation model

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

10

11

12

Frequency wafer 24 Layout 1

08

085

09

095

1

105

11

115

Loca-tion

Our Model Spatial Correlation model

micro σ 95 T (s) micro σ 95 T (s)

LL-C +07 +11 +05 153 +24 +15 +52 154 (101X)

LR-C +02 +11 -02 147 +00 +88 -14 155 (105X)

UL-C -02 -06 +01 152 -02 +76 -01 153 (101X)

UR-C +02 +06 +01 149 -07 +48 -13 152 (102X)

Target function f(X1 X2)

Samples of X1 X2

Joint moments of X1 X2

Error of moments of f as function of transfer matrix

Nonlinear programming

Minimizing error of moments of f

Transfer matrix W

Target function f(X1 X2)

Samples of X1 X2

Transfer matrix W Joint moments of X1 X2

Moments of P1 P2 ρij of P1 P2Function of P1 P2

g(P1 P2)

Result with Correct dependence

Result assuming ρij =0

Error

Linear operation is used to decomposed dependent variation sources

Not accurate with existence of non-linear dependence

Need to estimate the error introduced by ignoring non-linear dependence

Define high order correlation coefficient

Circuit delay Comparison

Wafer frequency

Wafer leakage

Across-wafer variation is looked on as spatial correlated in the die point of view

PDF comparison

Approximate max operation as second order polynomial

PDF comparison

Fourier Series approximation of PDF

Performance improve under different utilization rate Performance improvement histogram

PT

RA

CE

  • Slide 1

Top Related