[ieee 14th euromicro international conference on parallel, distributed, and network-based processing...

4
Evaluating Branch Prediction Using Two-level Perceptron Table Luiz Vinicius Marra Ribas Ronaldo Augusto de Lara Gonçalves PETROBRAS - Petróleo Brasileiro S.A. UEM - Universidade Estadual de Maringá TI / TI-E&P / STEP DIN - Departamento de Informática Rio de Janeiro - RJ - Brazil Maringá - PR - Brazil [email protected] [email protected] Abstract Nowadays, the commercial processors are designed on superscalar architectures. These processors use branch prediction techniques to forecast the code path that will be followed after each branch instruction, but before its execution. Branch prediction avoids pipeline stalls, anticipating the execution of instructions and providing high instruction level parallelism. This work evaluates a recent approach for intelligent branch prediction that is based on neural networks. Multiple Perceptrons were organized in a two-level prediction table indexed by the branch address in the first level and by the branch history pattern in the second level. Many situations were examined changing the number of lines and the associativity of the prediction table. This approach showed ability to predict branches, reaching more than 98% in some cases. 1. Introduction Branch instructions can reduce the parallelism of superscalar architectures because during the instruction fetch the branch direction and the target address probably are unknown. Branch predictors can predict the branch direction as well as the target address, avoiding the interruption of the instruction stream inside the pipeline and anticipating the fetch of the probable path [1]. Nowadays, most of techniques widely used for branch prediction, which provide better results, make use of a specific cache called Branch Target Buffer (BTB) [2]. Each line of the BTB keeps information to identify branches (normally their address tags) and to supply the predictions (normally counters or histories), target addresses and probably some target instructions to make the control flow redirection fast. Yeh and Patt [3, 4] proposed to organize the BTB in several levels, allowing predictions based on individual and correlated behavior of the branch instructions. This technique was called Two-Level Adaptive Training Prediction. McFarling [5] and Evers et al. [6] also worked on two-level predictors. Recently, a new approach for branch prediction intends to use a engine based on the Perceptron neuron. However, it is necessary more experiments to understand its real benefits. In this work, we evaluated the two-level prediction based on multiple Perceptron under several situations. This paper is organized as follows. Section 2 introduces fundamental concepts about neural networks. Section 3 presents our prediction models. Section 4 describes the simulation environment and section 5 shows the performance evaluation. Section 6 summarizes the main conclusions and proposes future works. Finally, the references used in this work appear in the last section. 2. Neural networks The utilization of neural networks in several fields of human knowledge has grown significantly due to their singular features like learning, classification, prediction, optimization, approximation and others [7, 8, 9]. The fundamental component of neural networks is the artificial neuron, which is modeled from the biological neuron. Perceptron is one of the first implemented models to an artificial neuron, originally imagined by Rosemblatt [7, 8]. Jimenez and Lin [10, 11] presented a Perceptron- based method for branch prediction in computer architectures that showed better performance than conventional methods based on 2-bits counters, reaching more than 14% in the reduction of prediction errors over the Gshare predictor using traces composed of SPEC2000 benchmarks. Michaud-Seznec [12], Ribas [16] and Colin Egan [15] also worked on Proceedings of the 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’06) 1066-6192/06 $20.00 © 2006 IEEE

Upload: ral

Post on 26-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Evaluating Branch Prediction Using Two-level Perceptron Table

Luiz Vinicius Marra Ribas Ronaldo Augusto de Lara Gonçalves PETROBRAS - Petróleo Brasileiro S.A. UEM - Universidade Estadual de Maringá TI / TI-E&P / STEP DIN - Departamento de Informática Rio de Janeiro - RJ - Brazil Maringá - PR - Brazil [email protected] [email protected]

Abstract

Nowadays, the commercial processors are designed on superscalar architectures. These processors use branch prediction techniques to forecast the code path that will be followed after each branch instruction, but before its execution. Branch prediction avoids pipeline stalls, anticipating the execution of instructions and providing high instruction level parallelism. This work evaluates a recent approach for intelligent branch prediction that is based on neural networks. Multiple Perceptrons were organized in a two-level prediction table indexed by the branch address in the first level and by the branch history pattern in the second level. Many situations were examined changing the number of lines and the associativity of the prediction table. This approach showed ability to predict branches, reaching more than 98% in some cases.

1. Introduction

Branch instructions can reduce the parallelism of superscalar architectures because during the instruction fetch the branch direction and the target address probably are unknown. Branch predictors can predict the branch direction as well as the target address, avoiding the interruption of the instruction stream inside the pipeline and anticipating the fetch of the probable path [1].

Nowadays, most of techniques widely used for branch prediction, which provide better results, make use of a specific cache called Branch Target Buffer (BTB) [2]. Each line of the BTB keeps information to identify branches (normally their address tags) and to supply the predictions (normally counters or histories), target addresses and probably some target instructions to make the control flow redirection fast.

Yeh and Patt [3, 4] proposed to organize the BTB in several levels, allowing predictions based on individual and correlated behavior of the branch instructions. This technique was called Two-Level Adaptive Training Prediction. McFarling [5] and Evers et al. [6] also worked on two-level predictors. Recently, a new approach for branch prediction intends to use a engine based on the Perceptron neuron. However, it is necessary more experiments to understand its real benefits.

In this work, we evaluated the two-level prediction based on multiple Perceptron under several situations. This paper is organized as follows. Section 2 introduces fundamental concepts about neural networks. Section 3 presents our prediction models. Section 4 describes the simulation environment and section 5 shows the performance evaluation. Section 6 summarizes the main conclusions and proposes future works. Finally, the references used in this work appear in the last section.

2. Neural networks

The utilization of neural networks in several fields of human knowledge has grown significantly due to their singular features like learning, classification, prediction, optimization, approximation and others [7, 8, 9]. The fundamental component of neural networks is the artificial neuron, which is modeled from the biological neuron. Perceptron is one of the first implemented models to an artificial neuron, originally imagined by Rosemblatt [7, 8].

Jimenez and Lin [10, 11] presented a Perceptron-based method for branch prediction in computer architectures that showed better performance than conventional methods based on 2-bits counters, reaching more than 14% in the reduction of prediction errors over the Gshare predictor using traces composed of SPEC2000 benchmarks. Michaud-Seznec [12], Ribas [16] and Colin Egan [15] also worked on

Proceedings of the 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’06) 1066-6192/06 $20.00 © 2006 IEEE

prediction based on Perceptron. In the last one, Colin Egan used the HSA Simulator to work with two-level predictions, like us. However, we used the SimpleScalar Tool Set [13] and SPEC Benchmarks [14] in order to verify the real efficiency of the Perceptron when applied in specific situations. The next section presents our models.

3. Prediction models

In this work, the Perceptron is used to predict the outcome of the current branch instruction (pointed by the program counter) in a continuous branch stream. The Perceptron uses the history of branches executed most recently and it predicts the current branch using the synaptic weights. After the real execution of the branch, the history is updated with the real outcome (shifted right in the branch stream). After that, the Perceptron uses the branch history for training and learning and thus it adjusts the synaptic weights.

The basic prediction model proposed in this work is called TLPT (Two-Level Perceptron Table) and it uses two tables. The first one keeps branch histories and it is indexed directly by the branch address. The same branch history can appear many times in the first table, in any entry, generating a history pattern. Each entry of this first table addresses an entry of the second table, which keeps the synaptic weights for the Perceptron associated to respective history pattern.

However, there are two variations of the TLPT: Global TLPT and Local TLPT. Global TLPT predictor uses a global register that can be accessed by any Perceptron. This register keeps the branch history of the whole program, independently of branch addresses or history patterns. Local TLPT predictor uses a private history for each Perceptron of the second table. Each local history keeps the branch outcomes related to some first level pattern. Notice that the global one is trained to predict the next outcome of the global history and the local one is trained to predict the next outcome of the history of the branch history pattern.

4. Simulation environment

The SimpleScalar Tool Set [13] is widely used to evaluate superscalar architectures. It is composed by library, compiler, debugger and some simulators. The sim-bpred simulator is one that has open source and it is appropriated to evaluate well-known branch prediction techniques. The inputs for the sim-bpred are a benchmark (compiled for the MIPS instruction subset) and many parameters that configure the predictor. In our work, the sim-bpred was modified to include the TLPT predictor, which uses the basic engine implemented by Jimenez and Lin [10, 11].

The TLPT predictor was evaluated on 8 benchmarks of the SPEC [14] (4 integers: cc1, perl, ijpeg and li and 4 floatting-point: mgrid, fpppp, swim and wave5). Several sizes and associativity of the Perceptron table were considered on both models. The history size was also varied from 2 to 64 bits. The next section describes and analyzes the results.

5. Performance evaluation

Firstly, we simulated the TLPT model according to the variation of the Perceptron table size and of the history size. Figures 1 and 2 show the results for the versions global and local, respectively. The performance of the Global TLPT has fast growing when the history goes from 2 to 64 bits. This situation happens because all branches of the program modify the global history. Thus, the global history needs to be long for the Perceptron learn more about the patterns. Differently, Local TLPT changes slightly. Because the local histories are limited to little code regions, which have a few branches, the increase of the history size does not imply in significant increase of the performance.

Figure 1. Global TLPT: table x history sizes

Figure 2. Local TLPT: table x history sizes

Proceedings of the 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’06) 1066-6192/06 $20.00 © 2006 IEEE

We can see the performance increases when the number of lines varies of 64 to 1024. The best result arises when there are 1024 Perceptrons in the table. In this case, the average accuracy rate among all history sizes reaches about 95.83% for the global predictor and 94.73% for the local predictor. The speedup of the best case over the worst case reaches 2.67% for the Global TLPT and 6.34% for the Local TLPT.

Figures 3 and 4 show the performance results according to the branch type, for Global and Local TLPT, respectively, using 1024 Perceptrons. Global predictor reaches an average accuracy among all history sizes about 95.61% for branches forward and 93.63% for branches backward. The average performance for the global predictor, independently of the branch types, reaches about 95.82%. Local predictor reaches an average accuracy about 93.84% for branches forward and 91.98% for branches backward, which are mutually exclusive The average performance for the global predictor, independently of branch types, reaches about 94.73%. In this case, the global predictor is a little better than the local one is.

Figure 3. Global TLPT: per-branch results

Figure 4. Local TLPT: per-branch results

Figures 5 and 6 show the performance results according to the benchmark, for Global TLPT and Local TLPT, respectively, also using 1024 Perceptrons. The global predictor reached the best results on the benchmarks swim, wave5 and mgrid,overcoming 98% in terms of average accuracy rate among all history sizes. In addition, in both predictors the worst case, obtained on the cc1, reached about 86.34% for global predictor and 83.91% for local predictor. In all situations, the global predictor was a few better than the local predictor.

In addition, we measured the impact of the associativity in the Perceptron table. Figures 7 and 8 show the results according to history sizes for Global TLPT and Local TLPT, respectively. We fixed the number of Perceptrons in 1024 but the table changed of 1024/1 to 64/16 according to “number of lines/associativity”.

Figure 5. Global TLPT: per-benchmark results

Figure 6. Local TLPT: per-benchmark results

The global predictor loses performance when the associativity enlarges, rather than the local predictor that reaches better performance with high associativity. In fact, global predictor tries to learn about the history patterns of the whole program and

Proceedings of the 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’06) 1066-6192/06 $20.00 © 2006 IEEE

the interferences among branches cause certain impact. On the other hand, the local predictor tries to learn about the local patterns and the interferences among branches are not so important. In this case, using high associativity reduces these interferences, ensuring space in the table for a greater amount of local histories.

Figure 7. Global TLPT: table x associativity

Figure 8. Local TLPT: table x associativity

6. Conclusions and future works

This work evaluates the performance of branch predictors based on multiple Perceptrons organized in two-level tables. The experiments showed that to ensure high accuracy rate this approach could be used on superscalar processors.

We simulated two versions of this predictor, Global TLPT and Local TLPT, under different configurations of the Perceptron table. In addition, we considered variations of history sizes and we analyzed performance considering the instruction types and benchmarks.

We have concluded the global predictor is a little more efficient than the local predictor is and the best performance was reached using 1024 Perceptrons

distributed linearly in the table. However, the main advantage is that the global predictor uses hardware smaller than the local predictor uses because it needs just one history register.

7. Acknowledgements

The authors would like to thank the CNPq, ARAUCÁRIA FOUNDATION and PETROBRAS S.A. by the financial support.

References

[1] SMITH, J. E.; SOHI, G. S. The Microarchitecture of SuperScalar Processors. Proceedings of the IEEE, p. 1609-1624, Dec. 1995. [2] BRAY, B. K.; FLYNN, M. J. Strategies for Branch Target Buffers. ACM – Association for Computing Machinery, Jun. 1991, p. 42-50. [3] YEH, T.; PATT, Y. N. Two-Level Adaptive Training Branch Prediction. The 24th ACM/IEEE International Symposium and Workshop on Microarchitecture, Nov.1991. [4] YEH, T.; PATT, Y. N. Alternative Implementation of Two-Level Adaptive Branch Prediction. The 19th Annual International Symposium on Computer Architecture, Gold Coast, Australia, May. 1992, p. 124-134.[5] MCFARLING, S. Combining Branch Predictors. Technical Report TN-36m, Digital Western Lab, Jun. 1993. [6] EVERS, M. et al. An analysis of correlation and predictability: what makes two-level branch predictors work, ISCA-25, 1998. [7] HAYKIN, S. Neural Networks. Prentice Hall. New York. 1994. [8] KOVÁCS, Z. L. Artificial Neural Networks: Fundamentals and Applications (In Portuguese), Second Edition, Collegium Cognition, Chapters 1-5. p. 13-90, 1996. [9] RUSSELL, S. J.; NORVIG, P. Artificial Intelligence: A Modern Approach. Prentice Hall Inc., p. 563-597, 1985. [10] JIMÉNEZ, D. A.; LIN, C. Perceptron Learning for Prediction the Behavior of Conditional Branches. Technical Report TR2000-08, University of Texas at Austin, 2000. [11] JIMÉNEZ, D. A.; LIN, C. Dynamic Branch Prediction with Perceptrons. Proceedings of the Seventh International Symposium on High Performance Computer Architecture, Jan. 2001 [12] MICHAUD, P.; SEZNEC, A. A Comprehensive Study of Dynamic Global History Branch Prediction. Technical Report RR-4219, INRIA, Rennes, Jun. 2001. [13] BURGER, D.; AUSTIN, T. M. The SimpleScalar Tool Set, Version 2.0. TR#1342, Computer Sciences Department, University of Wisconsin-Madison, Jun. 1997. [14] SPEC. The SPEC Benchmark Homepage. [15] EGAN, C. et al. Two-level branch prediction using neural networks, Journal of Systems Architecture, Elsevier, 49 (2003), 557-570. [16] RIBAS, L. V. M.; FIGUEIREDO, M. F. & GONCALVES, R. A. L. Use of Neural Networks in the Branch Prediction of Superscalar Architectures, Master Degree Dissertation, Federal University of Paraná, Brazil, Proposed in 08/23/2002 and Defended in 08/29/2003.

Proceedings of the 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’06) 1066-6192/06 $20.00 © 2006 IEEE