hyung-ock kim, jun seomun, jaehan jeon, chungki oh, wook kim, kyung-tae do, jung yun choi, hyo-sig...

13
Design and Post Optimization Flow for Advanced Thermal Management by Use of Body Bias Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Upload: eduardo-morgeson

Post on 30-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Design and Post Optimization Flow for Ad-vanced Thermal Management by Use of

Body Bias

Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim

Samsung Electronics, Korea

Page 2: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Introduction (1)

Temperature, One of Design Keys in Mobile SoC Temperature-limited operation is inevitable to prevent human skin burn in mobile devices Owing to performance trends and small form factor, temperature is a crucial design criteria in mo-

bile SoC design

Thermal Management To keep a silicon below temperature limit at the cost of performance sacrificing Conventional thermal management achieved by voltage / frequency drop [1-3], i.e. thermal throt-

tling in Figure 1, which must accompany performance drop Besides, it is required to prevent power source shutdown whenever thermal runaway happens Since leakage power has strong feedback to temperature, it is an important momentum of tem-

perature increase in mobile SoC , which is given by [4]

2

j

thleak

T

VK

ddleakleak eVKP2

1

(1)

Time

Tem

p.V

dd

Thermal throttling operation

Fre

q.

Voltage,freq. drop

Thermal upper limit

Thermal lower limit

Figure 1. Thermal throttling operation.

Page 3: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Introduction (2)

Body Bias Control [5] Figure 2 shows leakage current reduction by

use of reverse body bias (RBB) in nanometer-scale technologies

RBB can be utilized to relieve thermal throttling by weakening leakage-temperature feedback

Advanced Thermal Design and Management by Body Bias Use We propose body bias design and optimization scheme spanning from system-level de-

sign to post silicon tuning to enhance thermal management In design stage, thermal-leakage feedback and body bias design cost are formulated so

as to decide body bias use, which is followed by body bias implementation In post silicon, body bias use is explored and optimized both to optimize peak perfor-

mance and to save total power The proposed scheme has been implemented in 32nm HKMG commercial mobile SoC,

Exynos 4 Quad, and it results in 12.3% performance improvement in high speed mode and 19.1% total power saving

3

-65.0%

-60.0%

-55.0%

-50.0%

-45.0%

65nm 45nm 32nm 28nm

Ioff Decrease by 0.4V RBB @ SS

Figure 2. Leakage current reduction by RBB of 0.4V.

Page 4: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Advanced Thermal Management by Body Bias

Overall Design and Optimization Flow

In early-stage design decision, cost and gain of body bias are evaluated in a CPU core and other digital blocks (named as SoC)

Once body bias use is determined, body bias circuits including body bias generators (BBG) and power management unit are integrated into design

In back-end, implementation and validation of body bias network are exercised Post-silicon optimization is a body bias tuning to minimize total power and maximize peak perfor-

mance respect to process variation and temperature

4

Front-end de-sign

Back-end de-sign

Silicon testSilicon

Board,SW stacks

Early-stagedesign deci-

sion

Body bias de-sign

Body bias im-plement.

Post-siliconoptimization

Figure 3. Design flow of advanced thermal management.

Page 5: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Advanced Thermal Design (1)

Body Bias Design Flow Early-stage design decision in a CPU core

-Body bias comes at the cost of area: body bias generator, body bias network, and power management unit

-The cost of body bias can be estimated at floorplan stage, and then decided whether body bias is accepted or not

-If thermal runaway by leakage is expected to appear, we must adopt body bias not to loose per-formance by leakage current

5

Body bias area estimation

)_(

)_(

0

__

,01.0_

congroute

congrouteareablockarearoute

arearouteAAnA ctrlbbgabb

(2)

Thermal runaway estimation [4]

fVKeVKPPP

PTT

dddynT

VK

ddleakdynleaktot

jatotaj

j

thleak

21

2

,

(3)

If equation (3) is not converged, thermalrunaway is expected

CPU

Chip

SoC

n comes from body current calculation(will be covered later) comes from design experiences

Page 6: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Advanced Thermal Design (2)z

Body Bias Design Flow Body current calculation to determine # BBG

-Body current (GIDL and junction leakage) calculator has been developed to calculate body cur-rent, so that we can find proper number of BBG to drive a block

-A proposed calculator utilizes a set of look-up tables which are pre-defined for logic and mem-ory bit cell by using SPICE simulation

-Figure 4 presents calculation flow of body current and it is compared to silicon measurement in Figure 5, which shows proposed calculator over estimates body current up to 20%

-# BBG is expressed by max body current / BBG driving limit

6

Searching body currents in operating conditions

Gate counts,# bit cells

Process corner, Vdd, tem-perature, body bias

Body currentlook-up tables

Calculating total body cur-rents with design information

Body currents

Design information Operating conditions

Figure 4. Body current calculator.

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

0.2V 0.3V

Err

or

RBB

1.0V Vdd

1.1V Vdd

Figure 5. Comparison of calculated and measured bodycurrent in SoC silicon.

Page 7: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Advanced Thermal Design (3)

Body-Bias-aware Thermal Control Figure 6 shows overall thermal management scheme utilizing body bias Thermal management unit (TMU) periodically reads out temperatures from on-chip sensors Once a temperature exceeds thermal upper limit (recall Figure 1), interrupt controller asserts

thermal throttling to CPU, and then CPU controls Vdd, frequency, body bias through Vdd / freq / ABB manager

Vdd / freq / ABB values for thermal management is defined in post silicon optimization, which maximize thermal relaxation efficacy and to minimize performance loss

7

Temp.sensors

TMUInterruptcontroller

CPU

Vdd / Freq /ABB

manager

BBG

PLL

Regulator

Figure 6. Thermal management scheme using Vdd, frequency, and body bias control.

Page 8: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Post-Silicon Thermal Optimization (1)

Search for Optimal Thermal Management Point Thermal management optimization can be achieved by empirical practice because it can exactly

capture temperature changes by real user scenarios and real mobile sets (e.g. smartphone, tablet) Using RBB expands search space because RBB efficacy is dependent to leakage portion and Vdd

compensation for slow down increases dynamic power

Leakage portion is changed by process variation,DVFS control, and even ambient temperature

Figure 7 presents inverter path delay increase by RBB, where 40~50mV Vdd compensation is required for 0.4V RBB

This can increase engineering cost and optimization TAT in post silicon

Therefore, we will use “simplified” RBB policies:-Use of RBB is decided for each silicon group (binning group) respect to process variation -But we will “merge” RBB applying condition for each silicon group if they show similar charac-

teristics

8

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

14.0%

0.1V 0.2V 0.3V 0.4V

Sp

eed

Slo

w D

ow

n

RBB

1.0VDD

1.2VDD

Figure 7. Inverter path delay increase by RBB.

Page 9: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Post-Silicon Thermal Optimization (2)

Exynos 4 Quad as Test Vehicle Figure 8 presents block diagram of Exynos 4 Quad, where CPU is Cortex A9-based quad cores run-

ning up to 1.4GHz-To enhance computation performance, GPU, multimedia processors and interface units are integrated along

with 6.4GB/s dual-channel DRAM interface for wide memory bandwidth Body bias is used in quad core CPU to optimize thermal throttling

Thermal throttling evaluation board is shown in Figure 10 Power can be measured by external multimeter, and thermal throttling and on-chip temperature

are transferred to PC via RS-232-C connection on the fly

9

Figure 9. Thermal throttling evaluation board.

Exynos 4 Quad

RS-232-C for mode control in PC

Firmware

Pins for power measurement

Figure 8. Exynos 4 Quad block diagram.

Video

Audio

File

Image

Camera

Display

CPU core

L2 Cache

DRAMController

GPU

6.4GB/sdual channel

Body bias domain #1Body bias domain #2

Page 10: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Post-Silicon Thermal Optimization (3)

Thermal Optimization Practice Process variation and body bias control

-The use of RBB can be manipulated respect to process variation-Figure 10 shows total power measurement in various temperatures in high performance mode-Fast silicon shows steeper total power increase over temperature owing to leakage current-As is clear, in lower temperatures, RBB use results in power increase owing to voltage compen-

sation of 50mV; breakeven points are 65°C and 75°C in fast and slow silicon, respectively-Because typical temperature in high performance mode is over 75°C, RBB can be activated re-

gardless of process variation

10

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

20 40 60 80 100 120

To

tal

Po

wer

[n

orm

aliz

ed]

Junction Temperature [C]

Bypass

RBB

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

20 40 60 80 100 120T

ota

l P

ow

er [

no

rmal

ized

]

Junction Temperature [C]

Bypass

RBB

Figure 10. Total power saving by RBB use in (a) fast silicon and (b) slow silicon running at max speed.

(a) (b)

Breakeven pointBreakeven point

Page 11: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Post-Silicon Thermal Optimization (4)

Thermal Optimization Practice Thermal throttling improvement and total power saving

-Table 1 shows thermal throttling improvement measurement by RBB use in real application setup (running OS)

-Time-before throttling start is improved by up to 171.0%, which means if an application re-quires only short time of high performance mode, it may not experience performance loss by throttling

-In real application, normal status is improved by up to 12.3% by using RBB-Figure 11 shows total power saving measurement by RBB use in 1.0GHz operation and total

power saving is up to 19.1%-It is clear that RBB efficacy is getting better in high temperature and fast silicon

11

Slow silicon Fast silicon

Time-before throttling start improvement [%]

82.0 171.0

Normal status improvement [%]

7.0 12.3

Table 1. Performance improvement by RBBin real application setup (running at max speed)

0

5

10

15

20

25

25 75 85 105

Tota

l pow

er s

avin

g [%

]

Chip temperature [ºC]

Slow chip Fast chip

Figure 11. Total power reduction by RBB use in running at 1.0GHz.

Modetransition High performance mode

Thermalthrottling

Time-before throt-tling start

Freq

.

Page 12: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

Conclusion

Thermal throttling has been used so as to obey thermal limit to prevent hu-man skin burn while maximizing user experience in high performance mobile SoC

We have proposed a new thermal throttling method based-on RBB, which spanning from system-level design to post-silicon optimization

In system-level design, cost and efficacy of RBB use are formulated for high perfor-mance CPU

Body current calculator has been developed for robust design of RBB and thermal management scheme using RBB is presented

In post silicon optimization, we have proposed empirical policy to reduce engineering cost and maximize RBB efficacy to reduce thermal throttling

Proposed design and optimization have been applied to commercial mobile SoC, Exynos 4 Quad in 32nm HKMG

Proposed method improves peak performance by up to 12.3% in fast silicon and it can save total power up to 19.1% in 1.0GHz operation

171% improvement of time-before throttling start means proposed methodology can decrease thermal throttling chance when an application requires short period of high performance

12

Page 13: Hyung-Ock Kim, Jun Seomun, Jaehan Jeon, Chungki Oh, Wook Kim, Kyung-Tae Do, Jung Yun Choi, Hyo-Sig Won, Kee Sup Kim Samsung Electronics, Korea

References

[1] D. Brooks and M. Martonosi, “Dynamic thermal management for high-performance microprocessors,” in Proc. ISCA, pp. 171‒182, 2001.

[2] K. Skadron and et al, “Temperature-aware micro-architecture: modeling and implementation,” ACM Transaction on Architecture and Code Optimization, Vol. 1, No. 1, pp. 94‒125, 2004.

[3] A. Naveh and et al, “Power and thermal management in the Intel Core Duo processor,” Intel Technology Journal, Vol. 10, No. 2, pp. 109‒122, 2006.

[4] J. H. Choi, A. Bansal, M. Meterelliyoz, J. Murthy, and K. Roy, “Self-consistent approach to leakage power and temper-ature estimation to predict thermal runaway in FinFET circuits,” IEEE Transaction on Computer-Aided-Design, Vol. 26, No. 11, pp. 2059‒2068, 2007.

[5] J. W. Tschanz and et al, “Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage,” IEEE Journal of Solid-State Circuits, Vol. 37, No. 11, pp. 1396‒1402, 2002.

[6] D. Markovic, C. C. Wang, L. P. Alarcon, T.-T. Liu, and J. M. Rabaey, “Ultralow-power design in near-threshold region,” Proc. IEEE, Vol. 98, Issue 2, pp. 237-252, 2010.

[7] Y. Wang and et al, “A 4.0 GHz 291Mb voltage-scalable SRAM design in 32nm high-κ metal-gate CMOS with integrated power management,” in Proc. ISSCC, pp. 456‒457, 2009.

[8] C.-H. Jan and et al, “A 32nm SoC platform technology with 2nd generation high-k/metal gate transistors optimized for ultra low power, high performance, and high density product applications,” in Proc. IEDM, pp. 1‒4, 2009.

[9] S. Borkar, T. Karnik, S. Narenda, A. Keshavarzi, and V. De, “Parameter variations and impact on circuits and microar-chitecture,” in Proc. Design Automation Conference, June 2003, pp 338‒342.

13