tima lab. research reportstima.univ-grenoble-alpes.fr/publications/files/rr/pcr_247.pdf ·...

ISSN 1292-862

TIMA Lab. Research Reports

TIMA Laboratory, 46 avenue Félix Viallet, 38000 Grenoble France

http://tima.imag.fr

Power Consumption reduction using dynamic control of Micro Processor performance

David RIOS-ARAMBULA, Aurélien BUHRIG and Marc RENAUDIN

TIMA Laboratory - 46 Avenue Félix Viallet 38031 Grenoble Cedex - France

http://tima.imag.fr {David.Rios, Aurelien.Buhrig, Marc.Renaudin}@imag.fr

Abstract. An alternative way to reduce power consumption using dynamic voltage scaling is presented. The originality of this approach is the modeling and simulation of a system where each application indicates its performance needs (in MIPS) to the operating system, which in turn is able to know the global speed requirements of the system to meet all real time application deadlines. To achieve this level of control, a co-processor is described, that receives a set point command from the OS, and manages a DC/DC converter implemented as a charge pump, in order to have the system speed fitting this set point. This architecture is especially suited for asynchronous processors but can be adapted for synchronous ones as well.

1 Introduction

With the increased demand of embedded and portable circuits that require higher autonomy and power saving, research in low power has become of great importance [1]. Systems that use microprocessors are the perfect example of low power needs. Evolution of such systems requires a high integration of functions that demand a lot of computational power that is reflected on battery power consumption. Actually, research in batteries has achieved an excellent level but the evolution in that domain tends to be small and slow. To bypass this problem, some solutions are viable. First it is possible to optimize the circuit design. Transistor sizing is essential for low power circuits. But this technique has been well studied and transistor sizing is no longer sufficient to obtain the power efficiency that new circuits require. Another approach is to reduce dynamic and static consumption of circuits. For this, two techniques are used. Firstly, the Adaptive Body Biasing (ABB) technique allows to reduce the static energy that is consumed by digital circuits. In fact, a circuit that is idle has static leakage current that is becoming similar to the dynamic current used by the circuit. This means that it is essential to reduce static current to obtain the performances needed. The second technique, the Dynamic Voltage Scaling (DVS), reduces the dynamic current consumed by the circuit. Here, the power voltage of the circuit is lowered or increased to reduce the energy consumed by the circuit. Both techniques are used in microprocessors design ([2], [3], [4]) and can be

2 David RIOS-ARAMBULA, Aurélien BUHRIG and Marc RENAUDIN

combined to obtain the maximum power saving [5]. A brief description of the DVS technique is presented in the following paragraphs.

1.1 DVS

As seen in [6] and [7], Dynamic voltage Scaling was used for the first time in microprocessor systems. In a CMOS microprocessor, the energy consumed can be calculated with:

2*VCEops ∝ (1)

with C the switched capacitance and V the supply voltage [8]. Therefore, to minimize the energy consumed by the circuit, we can reduce the capacitance. This is done by applying aggressive low-power design techniques [9] such as clock-gating [10]. The number of instructions executed by the application is also a parameter that has to be kept in mind to achieve a full energy optimization. This is commonly optimized by compilers. Finally, as we can see from the squared relationship between energy and voltage, a small decrement of the voltage leads to a non-negligible energy reduction factor. This voltage reduction is done by the DVS technique, and takes advantage of the square relationship to perform energy optimization. With this technique it is possible to save up to 80% of power [11]. This energy reduction doesn't come without a price. Scaling the voltage on a microprocessor changes the system speed. In fact, as seen in equation 2, with V the supply voltage and c a process dependent constant, reducing the supply voltage reduces the maximum speed at which the device can operate.

VcVf −

∝max (2)

This is very interesting when possible to adapt the supply voltage, so that the speed is sufficient to meet the application deadline. Hence, the operating system must use the application profile to manage the energy needed by the system and control power voltage [12]. To achieve this kind of control, an interaction between hardware, software and the operating system is needed. Fig. 1 shows an application executed by the processor in a time slot T. This time slot is the maximum time allocated to the application to be executed. The total consumed energy is greater when no control is applied but the execution time increases when control is applied. Moreover, a processor also has periods of time where it does nothing [13]. In that case the processor is still consuming energy, and could be set to an idle state reducing all the dynamic current to zero.

1.2 Asynchronous and synchronous processors

As described before, processors are the perfect targets for DVS technique. We have to consider that there are two kinds of technologies for processors: synchronous

Power Consumption reduction using dynamic control of Micro Processor performance 3

and asynchronous. Synchronous processors dominate the actual market of processors as well as the research studies. They need a global clock that controls all the system. When using DVS, if the power supply of the processor is changed, the clock frequency has to be changed accordingly. This means that we have to take care of the synchronization of the PLL.

Fig. 1. Example of an application with and without DVS control

Asynchronous processors are becoming more popular on the market. They are very modular and are low-power design by themselves. In asynchronous processors, there is no global clock; all the tasks are locally controlled by a handshake protocol. This allows us to reduce the power supply without having to reduce the clock frequency. This reduces the size of the control circuit needed to control the energy consumed by the processor using a DVS technique. Consequently, asynchronous processors are excellent candidates for DVS design [14]. The power saving obtained is higher than in synchronous processors. Although our work is focused on asynchronous processors, all the ideas and methods proposed in this work can also be applied to synchronous ones. This method has proved to be very efficient in a system with two ASPRO processors [15] and gave a 60% energy reduction in a digital image processing application [16].

1.3 Contribution

This paper explains the implementation of a DVS technique in the design of a co-processor that controls the DC/DC regulator of the power supply of a microprocessor. The co-processor dynamically controls the DC/DC to scale the supply voltage as well as the clock frequency (for synchronous processors) in order to satisfy the applications computational needs. The idea of providing Voltage Scaling is not new, but we present in this study two main contributions:

- Modeling the power management as a process control system. - Describing a co-processor enabling loop-back control of the DC/DC

regulator to achieve the speed required by the system.

t2

t

t1 t2 t1 Application without control

Application with DVS control

T T

Speed


2 System overview and architecture

With a feedback system, it is possible to control with high accuracy the power consumed by the system. This is done respecting all the specifications and needs of the applications executed by the processor. Indeed, each application that is to be executed indicates to the coprocessor the speed it requires. The information about the speed can be inserted statically into the code at compile time or mentioned dynamically at run time. Therefore, adding the application speeds creates the speed profile of the whole system, as well as the one required by the operating system, so it is used to precisely control the speed needed. Consequently, it is possible to apply a fine-grain power management allowing real-time application deadlines to be met. Fig. 2 shows an example of application profiles and the set point applied to the system in order to operate correctly. Note that scheduling applications using this technique is an important issue that is not the gist of this paper and is not treated here.

Fig. 2. dynamic set point during a software execution

Fig. 3 shows the block diagram of the system. The co-processor integrates an instruction counter and a clock. It calculates the real speed of the processor in MIPS (Millions Instructions Per Second), averaged on a period of computation. This speed is then compared to the set point given by the application and the co-processor generates the proper digital control to the DC/DC.

2.1 DC/DC

The DC/DC chosen is a simple charge pump that is able to increase, decrease or maintain its output voltage and supplies the processor. The control of the DC/DC is a digital 2-bit code. Fig. 4 shows the schematic of the charge pump. This type of DC/DC controller is very small, very simple and can deliver a strong output current.

Speed (MIPS)

Speed (MIPS)

Speed (MIPS)

t

t

Application 1

Application 2

System load =

Set point

t


Fig. 3. System architecture

The operation of the charge pump is very simple. A 2-bit command controls the N and P transistors. When the P transistor is ON, the capacitor C is charged and the power transistor increases the output voltage. If the N transistor is ON, the capacitor C is discharged, to decrease the output voltage.

Fig. 4. Charge pump

2.2 The control

There are many ways to control the DC/DC regulator (Proportional, Derivative, integral, fuzzy logic, etc). Since the control is performed by the co-processor, this allows us to choose the way the control is performed. In our design fuzzy logic suits very well but it has the inconvenience of needing a lot of computing resources. We have then chosen to use a PID controller. This kind of control allows a faster response and reduces the overshoot of the output. The DC/DC regulator can be considered as an ON/OFF system while the PID gives us a digital output that cannot control the regulator by itself. Therefore the PID is translated into a time proportional control that manages the time the command is on within a time slot. This time is calculated with respect to the PID output.

Vbat Vbat

Cout

Cload

P

N

µ-Processor

Setpoint

end_of_instruction signal

Regulator DC/DC (+ PLL)

Co-Processor

Vdd Voltage

Freq. clk

CLK


2.3 Real speed calculation

The processor executes instructions one by one, giving us a flow of executed instructions. The quantity of instructions executed during a specific time is the speed at which the processor is running. For synchronous processors, because of the "wait on interrupt" instructions, the real speed is not always linked to its clock frequency. For asynchronous processors, instructions are executed with the exact amount of time needed; this time is generally different for each instruction. Therefore, for both type of processors, a hardware component must be integrated to compute the real speed of the processor. There are two kinds of fluctuations in the calculated speed. The first comes from the nature of the instructions executed that takes a different amount of time. This fluctuation is not a problem, since the calculated speed is the real speed averaged on a period of time needed to count the number of instruction executed. Therefore, this fluctuation remains small. The second occurs when the voltage (and frequency for synchronous processors) is changing. In this case, the speed of the processor starts changing and instructions are executed in a variable time. The co-processor samples the speed and gets an average number of executed instructions in the period of time considered. Consequently, as shown in Fig.5, when moving from speed S1 to speed S2, the speed calculated by the coprocessor, Sc, differs from the real average speed of the processor. This wrong speed disturbs the control of the DC/DC regulator.

Fig. 5. Speed evolution during the transition

To solve this problem, two solutions can be used. First, it is possible to memorize the last speed of the processor. Then, when the new sample is computed, we add to the new speed Sc the difference between the previous speed S1 and the new speed Sc and the error made is lowered.

)( 1SSSS ccr −+= (3)

It is also possible to consider this error like a perturbation in the PID control, and we can tune the PID to compensate it. In the first solution, more hardware is required, while in the second one the real speed will not be defined as precisely as in the first solution and the speed control will be slower.

S2

S1

t1 t2

Tc

Sc

Speed

t


3 Modeling and Simulation

The system has been simulated using the ELDO simulator. The goal of simulation is to understand how the system works and to quantify the power saved.

Fig. 6. Block diagram of the PID control

3.1 Control

Control is composed by two parts. The first counts the instructions at each edge of the end_of_instruction signal provided by the processor. The second performs the actual control. The control gets a set point and the end_of_instruction signal, generated by the processor each time it finishes an instruction. This way we can count the number of instructions done in a time given by an external clock signal CLK. As we can see, the PID control receives Kp, Ki and Kd. With these coefficients the software can tune the PID as needed. In this first simulation approach, we did not try to change the coefficients dynamically, but we leave this possibility open for future work. The PID control is not the main goal of this study since each targeted processor will require its own coefficients, the complete study of the coefficients is not detailed in this paper. Figure 6 shows the diagram of the PID itself. We can see the timed control block, which will provide the real control of the regulator. This block takes the output of the PID block and generates two signals for the P and N transistors of the regulator. Those signals are set for a variable time depending on the PID output.

3.2 Processor

To have a more realistic simulation of the system, we model the Lutonium processor [17] in a VHDL-AMS block.

3.3 Regulator

As described before, we used a charge pump to provide the supply voltage. This charge pump can scale its output voltage with a 2 bit digital command. The regulator was described using ELDO, and a low leakage transistor model from HCMOS9 of STMicroelectronics.

PID

Time Proportional

Set point

Speed calculator -+

Kp K Kd N

P ε

Set

End of instruction

CLK


4 Benchmarking the global system

To quantify the power consumption gain, an example is presented in Fig. 7 where two tasks are used to illustrate the software load profile. The first task (Fig. 7.a) is an interrupt routine. It simulates a task that has to be processed at a given frequency. This is quite common in embedded systems and generally reduces the efficiency of the traditional synchronous DVS processors, because of the synchronization time of the PLL. The second task (Fig 7.b) represents an application running in the processor. The last graph (Fig 7.c) is the resulting profile that will be used. As we can see the processor is not always active, it is processing tasks at full speed (200 MIPS). As discussed before, the OS can take advantage of this idle time, and reduce the supply voltage to compute each task in a longer time. The maximum time is set by the next task and has to be respected. Fig. 9.a shows the dynamic set point given by the operating system. Figs. 9.b and 9.c, show the voltage at the processor and the resulting real speed after simulation.

Fig. 10 presents the results of simulation for the profile shown in Fig. 7. With this DVS technique, the global system consumes about 18 µJ (Fig. 10.a), while at maximal speed the CPU would consume more than 33µJ (Fig 10.b) during the same period of time. So there is more than 45% energy saving.

Note that the last graph shows the real consumption of the processor (Fig 10.c) without the DC/DC, which is about 10 µJ. This indicates that the charge pump used has an average efficiency of 55%. Fig. 8 shows that the efficiency of this charge pump is between 45% and 70% according to the output voltage.

The global efficiency of the charge pump used is not as good as we could have expected, and could be changed or modified; nevertheless the gain obtained is very good for this study. Considering a DC/DC with an 85% efficiency for this profile, the global energy consumption would be 11.8µJ which represents a 66% energy reduction.

Fig. 7. Processor load versus time of the ap-plication

Energy/MIPS

0,00E+00

2,00E-03

4,00E-03

6,00E-03

8,00E-03

1,00E-02

1,20E-02

1,40E-02

1,60E-02

0 20 40 60 80 100MIPS

mW

0%

10%

20%

30%

40%

50%

60%

70%

80%

Global system ProcessorEfficiency

Fig. 8. Power consumption of the global system and power delivered to the processor as a function of speed – Efficiency of the charge pump

a)

b)

c)


Fig. 9. Set point given by the system - Volt-age supplied to the processor – Real speed of

the processor

Fig. 10. Energy consumed with the profile shown before by : - global system using DVS - CPU alone at max power without DC/DC - CPU alone with DVS

5 Conclusion and Future work

This work presents simulation results on a DVS system with a feedback control for potentially unpredictable software loads. Future work will deal with increasing the efficiency of the DC/DC, and manage the voltage of the bulk to limit the static leakage.

References

1. Benini, L., Bogliolo, A., De Micheli, G.: “A Survey of Design Techniques for System-Level Dynamic Power Management”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 8, N° 3, pp. 299–316, June 2000.

2. Pering, T., Burd, T., Broderesen, R.: “Dynamic Voltage Scaling and the Design of a Low-Power Microprocessor System”, Power-Driven Microarchitecture Workshop, in conjunction with Intl. Symposium on Computer Architecture, Barcelona, Spain, June 1998.

3. Pering, T., Burd, T., Brodersen, R.: “Voltage Scheduling in the lpARM Microprocessor System”, ISLPED‘00, Rapallo, Italy pp. 96–101, 2000.

4. Martin, Flautner, Mudge, Blaauw: “Combined dynamic voltage and adaptive body biasing for low power microprocessors under dynamic workload”, ICCAD’02

a)

b)

c) c)

b)

a)


5. E. Labonne, G. Sicard, M. Renaudin, “Dynamic voltage Scaling and Adaptive Body Biasing study for Asynchronous design” http://tima.imag.fr, TIMA-RR--04/06-01-FR, TIMA Lab. Research Reports, 2004

6. K. Govil, E. Chan, H. Wasserman, “Comparing Algorithms for Dynamic Speed-Setting of a Low-Power CPU”, Proc. 1st Int’l Conference on Mobile Computing and Networking, Nov 1995.

7. M. Weiser, B. Welch, A. Demers, and S. Shenker, “Scheduling for reduced CPU energy,” Proc. 1st Symp. on Operating Systems Design and Implementation, pp. 13-23, Nov. 1994.

8. T. Burd and R. Brodersen, “Energy Efficient CMOS Microprocessor Design,” Proc. 28th Hawaii Int’l Conf. on System Sciences, 1995.

9. M. Srivastava, A. Chandrakasan, R. Brodersen. Predictive system shutdown and other architectural techniques for energy efficient programmable computation. IEEE Transactions on VLSI System, 4(1), 1996.

10. F. Emnett, M. Biegel “Power Reduction Through RTL Clock Gating” Synopsys User Group, San Jose, March 2000.

11. T. Pering, T. Burd, and R. W. Brodersen, “The Simulation and Evaluation of Dynamic Voltage Scaling Algorithms,” Proc. 1998 Int’l Symp. On Low Power Electronics Design.

12. Flautner, K.: “Automatic Monitoring for Interactive Performance and Power Reduction”, Dissertation, Michigan University, 2001.

13. Luca Benini, Alessandro Bogliolo, and Giovanni De Micheli, “A survey of design techniques for system-level dynamic power management”, IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 8, no. 3, June 2000, pp. 299-316.

14. A.J. Martin, “An asynchronous approach to energy-efficient computing and communication” SSGRR 2000, August 2000.

15. M. Renaudin, P. Vivet, F. Robin, "ASPRO: an asynchronous 16-bit RISC Microprocessor with DSP capabilities", ESSCIRC, Duisburg, Germany, 1999.

16. Christian Piguet, “Low Power Electronics Design”, CRC Press, ISBN 0-8493-1941-2, 2005

17. A.J. Martin, M. Nyström et al., “The Lutonium: A Sub-Nanojoule Asynchronous 8051 Microcontroller,” IEEE Int. Symp. Async. Systems and Circuits, May 2003.

18. Pedram, M.: “Design Technologies for Low Power VLSI”, Encyclopedia of Computer Science and Technology, Vol. 36. Marcel Dekker, Inc., pp. 73–96, 1997.

tima lab. research reportstima.univ-grenoble-alpes.fr/publications/files/rr/pcr_247.pdf ·...

Documents