references - springer978-1-4757-5808-5/1.pdf · dependent circuits based on symbolic computation of...
TRANSCRIPT
References
[1] A. Agarwal, S. D. Pudar, "Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches," ISCA-93: ACM/IEEE International Symposium on Computer Architecture, pp. 179-180, San Diego, CA, May 1993.
[2] D. H. Albonesi, "Selective Cache Ways: On-Demand Cache Resource Allocation," IEEE International Symposium on Microarchitecture, pp.248-259, Haifa, Israel, November 1999.
[3] M. Alidina, J. Monteiro, S. Devadas, A. Ghosh, M. Papaefthymiou, "Precomputation-Based Sequential Logic Optimization for Low Power," IEEE Transactions on VLSI Systems, Vol. 2, No. 4, pp. 426-436, December 1994.
[4] ARM Corporation, ARM Software Development Toolkit, Version 2.50, Reference Guide, ARM DUI 0041C, Chapter 12, November 1998.
[5] Artisan Components, Process-Perfect SRAM Generator Datasheet, http://www.artisan.com. 1999.
[6] R. 1. Bahar, E. T. Lampe, E. Macii, "Power Optimization of TechnologyDependent Circuits Based on Symbolic Computation of Logic Implications," ACM Transactions on Design Automation of Electronic Systems, Vol. 5, No. 3, pp. 267-293, July 2000.
[7] R. I. Bahar, H. Cho, G. D. Hachtel, E. Macii, F. Somenzi, "Symbolic Timing Analysis and Re-Synthesis for Low Power of Combinational Circuits Containing False Paths," IEEE Transactions on CAD/ICAS, Vol. 16, No. 10, pp. 1101-1115, October 1997.
[8] R. 1. Bahar, G. Albera, S. Manne, "Power and Performance Tradeoffs Using Various Caching Strategies," ISLPED-98: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 64-69, Monterey, CA, August 1998.
[9] R. S. Bajwa, M. Hiraki, H. Kojima, D. J. Gorny, K. Nitta, A. Shridhar, K. Seki, K. Sasaki, "Instruction Buffering to Reduce Power in Processors for Signal
129
130 MEMORY DESIGN TECHNIQUES
Processing," IEEE Transactions on VLSI Systems, Vol. 5, No. 4, pp. 417-424, December 1998.
[10] N. Bellas, I. Hajj, C. Polychronopoulos, G. Stamoulis, "Architectural and Compiler Support for Energy Reduction in the Memory Hierarchy of High Performance Microprocessors," ISLPED-98: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 64-69, Monterey, CA, August 1998.
[11] L. Benini, P. Siegel, G. De Micheli, "Automatie Synthesis of Gated Clocks for Power Reduction in Sequential Circuits," IEEE Design and Test 0/ Computers, Vol. 11, No. 4, pp. 32-40, December 1994.
[12] L. Benini, G. De Micheli, "State Assignment for Low Power Dissipation," IEEE Journal 0/ Solid State Circuits, Vol. 30, No. 3, pp. 258-268, March 1995.
[13] L. Benini, G. De Micheli, "Transformation and Synthesis of FSMs for Low Power Gated Clock Implementation," IEEE Transactions on CAD/ICAS, Vol. 15, No. 6, pp. 630-643, June 1996.
[14] L. Benini, G. De Micheli, E. Macii, D. Sciuto, C. Silvano, "Asymptotic Zero-Transition Activity Encoding for Address Busses in Low-Power Microprocessor-Based Systems," GLS- VLSI-97: IEEE/ACM 7th Great Lakes Symposium on VLSI, pp. 77-82, Urbana-Champaign, IL, March 1997.
[15] L. Benini, G. De Micheli, E. Macii, D. Sciuto, C. Silvano, "Address Bus Enco ding Techniques for System-Level Power Optimization," DATE-98: IEEE Design Automation and Test in Europe, pp. 861-866, Paris, France, February 1998.
[16] L. Benini G. De Micheli, E. Macii, M. Poncino, S. Quer, "Power Optimization of Core-Based Systems By Address Bus Encoding," IEEE Transactions on VLSI Systems, Vol. 6, No. 4, pp. 554-562, December 1998.
[17] L. Benini, G. De Micheli, Dynamic Power Management 0/ Electronic Systems, Kluwer Academic Publishers, 1998.
[18] L. Benini, F. Vermeulen, G. De Micheli, "Finite State Machine Partitioning for Low Power Consumption," ISCAS-98: IEEE International Symposium on Circuits and Systems, Vol. 2, pp. 5-8, Monterey, CA, May 1998.
[19] L. Benini, G. De Micheli, A. Lioy, E. Macii, G. Odasso, M. Poncino, "Synthesis of Power-Managed Sequential Components Based on Computational Kernel Extraction," IEEE Transactions on CAD/ICAS, Vol. 20, No. 9, pp. 1118-1131, September 2001.
[20] L. Benini, G. De Micheli, E. Macii, M. Poncino, R. Scarsi, "Symbolic Synthesis of Clock-Gating Logic for Power Optimization of Synchronous Controllers," ACM Transactions on Design Automation 0/ Electronic Systems, Vol. 4, No. 4, pp. 351-375, October 1999.
[21] L. Benini, G. Paleologo, A. Bogliolo, G. De Micheli, "Policy Optimization for Dynamic Power Management," IEEE Transactions on CA D/ICAS, Vol. 18, No. 6, pp.813-833, June 1999.
REFERENCES 131
[22] L. Benini, A. Macii, E. Macii, M. Poncino, R. Scarsi, "Architectures and Synthesis Algorithms for Power-Efficient Bus Interfaces", IEEE Transactions on CAD/ICAS, Vol. 19, No. 9, pp. 969-980, September 2000.
[23] L. Benini, G. De Micheli, A. Macii, E. Macii, M. Poncino, R. Scarsi, "Glitch Power Minimization by Selective Gate Freezing," IEEE Transactions on VLSI Systems, Vol. 8, No. 3, pp. 287-299, June 2000.
[24] L. Benini, G. Castelli, A. Macii, E. Macii, R. Scarsi, "Battery-Driven Dynamic Power Management of Portable Systems," ISSS-OO: IEEE International Symposium on System Synthesis, pp. 25-30, Madrid, Spain, September 2000.
[25] L. Benini, A. Bogliolo, G. De Micheli, "A Survey of Design Techniques for System-Level Dynamic Power Management," IEEE Transactions on VLSI Systems, Vol. 8, No. 3, pp. 299-316, June 2000.
[26] L. Benini, G. De Micheli, "System-Level Power Optimization: Techniques and Tools," ACM Transactions on Design Automation of Electronic Systems, Vol. 5, No. 2, pp. 115-192, April 2000.
[27] L. Benini, A. Macii, E. Macii, M. Poncino, "Increasing Energy Efficiency of Embedded Systems by Application-Specific Memory Hierarchy Generation," IEEE Design and Test of Computers, Vol. 17, No. 2, pp. 74-85, April 2000.
[28] L. Benini, A. Macii, E. Macii, M. Poncino, "Selective Instruction Compression for Memory Energy Reduction in Embedded Systems," ISLPED-99: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 206-211, San Diego, CA, August 1999.
[29] L. Benini, A. Macii, M. Poncino, "A Recursive Algorithm for Low-Power Memory Partitioning," ISLPED-OO: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 78-83, Rapallo, Italy, July 2000.
[30] L. Benini, A. Macii, A. Nannarelli, "Cached-Code Compression for Energy Minimization in Embedded Processors," ISLPED-Ol: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 322-327, Huntington Beach, CA, August 2001.
[31] L. Benini, L. Macchiarulo, A. Macii, , E. Macii, M. Poncino, "From Architecture to Layout: Partitioned Memory Synthesis for Embedded Systems-onChip," DAC-38: ACM/IEEE Design Automation Conference, pp. 784-789, Las Vegas, NV, June 2001.
[32] M. Borgatti, et al., "A 64-Min Single-Chip Voice Recorder/Player Using Embedded 4-b/cell FLASH Memory," IEEE Journal of Solid-State Circuits, Vol. 36, No. 3, pp. 516-521, March 2001.
[33] D. C. Burger, J. R. Goodman, A. Kagle, "Limited Bandwidth to Affect Processor Design", IEEE Micro, Vol. 17, No. 6, pp. 55-62, November-December 1997.
[34] D. C. Burger, Hardware Techniques to Improve the Performance of the Processor/Memory Interface, Ph.D. Dissertation, University of Wisconsin-Madison, 1998.
132 MEMORY DESIGN TECHNIQUES
[35] P. Cappelletti, C. Golla, P. Olivo, E. Zanoni, Flash Memories, Kluwer Academic Publishers, 1999.
[36] F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, A. Vandecappelle, Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia System Design, Kluwer Academic Publishers, 1998.
[37] A. Chandrakasan, S. Sheng, R. W. Brodersen, "Low-Power CMOS Digital Design," IEEE Journal of Solid-State Circuits, Vol. 27, No. 4, pp. 473-484, April 1992.
[38] A. Chandrakasan, W. Bowhill, F. Fox, Design of High-Performance Microprocessor Circuits, IEEE Press, 2001.
[39] A. P. Chandrakasan, M. Potkonjak, R. Mehra, J. Rabaey, R. W. Brodersen, "Optimizing Power Using Transformations," IEEE 'JIransactions on CAD/ICAS, Vol. 14, No. 1, pp. 12-31, January 1995.
[40] J. M. Chang, M. Pedram, "Low Power Register Allocation and Binding," DAC-32: ACM/IEEE Design Automation Conference, pp. 29-35, San Francisco, CA, June 1995.
[41] J. M. Chang, M. Pedram, "Module Assignment for Low Power," EuroDAC-96: IEEE European Design Automation Conference, pp. 376-381, Geneva, Switzerland, September 1996.
[42] H. Chang, et al., Surviving the SoC Revolution: A Guide to Platform-Based Design, Kluwer Academic Publishers, 1999.
[43] S. Y. Chiang, "Foundries and the Dawn of an Open IP Era," IEEE Computer, Vol. 34, No. 4, pp. 43-46, April 2001.
[44] S. H. Chow, Y. C. Ho, T. Hwang, C. L. Liu, "Lower Power Realization ofFinite State Machines - A Decomposition Approach," ACM 'JIransactions on Design Automation of Electronic Systems, Vol. 1, No. 3, pp. 315-340, July 1996.
[45] S. L. Coumeri, D. E. Thomas, "Memory Modeling for System Synthesis," ISLPED-98: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 179-184, Monterey, CA, August 1998.
[46] S. L. Coumeri, Modeling Memory Organizations tor the Synthesis 0/ Low Power Systems, Ph. D. Dissertation, EE and CS Dept., Carnegie Mellon University, May 1999.
[47] J. Davis 11, et al., Overview 0/ the Ptolemy Project, ERL Technical Report UCB/ERL No. M99/37, UC Berkeley, 1999.
[48] Dolphin Integration, Ragtime Embedded Memory Generators, 2001.
[49] A. Farrahi, G. Tellez, M. Sarrafzadeh, "Memory Segmentation to Exploit Sleep Mode Operation," DAC-32: ACM/IEEE Design Automation Con/erence, pp. 36-41, San Francisco, CA, June 1995.
REFERENCES 133
[50] A. Farrahi, M. Sarrafzadeh, "System Partitioning to Maximize Sleep Time," ICCAD-95: IEEE/ACM International Conference on Computer-Aided Design, pp. 452-455, San Jose, CA, November 1995.
[51] B. R. Fisk, R. I. Bahar, "The Non-Critical Buffer: Using Load Latency Tolerance to Improve Data Cache Efficiency, ICCD-99: IEEE International Conference on Computer Design, pp. 538-545, Austin, TX, October1999.
[52] D. Flynn, "AMBA: Enabling Reusable On-Chip Designs," IEEE Micro, Vol. 17, No. 4, pp. 20-27, July-August 1997.
[53] D. Frank, R. Dennard, E. Novak, P. Solomon, Y. Taur, H. S. Wong, "Device Scaling Limits of Si MOSFETs and Their Application Dependencies," Proceedings of the IEEE, Vol. 89, No. 3, pp. 259-288, March 200l.
[54] D. D. Gajski, N. D. Dutt, A. C. H. Wu, S. Y.-L. Lin, High-Level Synthesis -Introduction to Chip and System Design, Kluwer Academic Publishers, 1992.
[55] Gartner, Inc., Final 2000 Worldwide Semiconductor Market Share, 2000.
[56] C. Gebotys, "Low Energy Memory and Register Allocation Using Network Flow," DAC-34: ACM/IEEE Design Automation Conference, pp. 435-440, Anaheim, CA, June 1997.
[57] J. D. Gee, M. D. Hili, D. N. Pnevmatikatos, A. J. Smith, "Cache Performance of the SPEC Benchmark Suite," IEEE Micro, Vol. 13, No. 4, pp. 17-27, August 1993.
[58] Goldman-Sachs Technical Report, Wireless Wave II - The Data Wave Unplugged, 1999.
[59] A. Gonzalez, C. Aliagas, M. Valero, "A Data-Cache with Multiple Caching Strategies Tuned to Different Types of Locality," IC8-95: ACM International Conference on Supercomputing, pp. 338-347, Barcelona, Spain, July 1995.
[60] P. Grun, N. Dutt, A. Nicolau, "Access Pattern Based Local Memory Customization for Low-Power Embedded Systems," DATE-OI: IEEE Design Automation and Test in Europe, pp. 778-784, Munich, Germany, March 2001.
[61] M. Gumm, VHDL-Modeling and Synthesis of the DLX RISC Processor, University of Stuttgart, Department of Integrated Systems Engineering, Stuttgart, Germany, 1995.
[62] G. D. Hachtel, M. Hermida, A. Pardo, M. Poncino, F. Somenzi, "Re-Encoding Sequential Circuits to Reduce Power Dissipation," ICCAD-94: IEEE/ACM International Conference on Computer-Aided Design, pp. 70-73, San Jose, CA, November 1994.
[63] A. Hasegawa, et al., "SH3: High Code Density, Low Power," IEEE Micro, Vol. 15, No. 6, pp. 11-19, December 1995.
[64] J. L. Hennessy, D. A. Patterson, Computer Architecture - A Quantitative Approach, II Edition, Morgan Kaufmann Publishers, 1996.
134 MEMORY DESIGN TECHNIQUES
[65J C. H. Hwang, A. C. H. Wu, "A Predictive System Shutdown Method for Energy Saving of Event-Driven Computation," ICCAD-97: IEEE/ACM International Conference on Computer-Aided Design, pp. 28-32, San Jose, CA, November 1997.
[66J IBM Blue Logic Technology, http://www.chips.ibm.com/bluelogic
[67J S. Iman, M. Pedram, "Multi-Level Network Optimization for Low Power," ICCAD-94: IEEE/ACM International Conference on Computer-Aided Design, pp. 372-377, San Jose, CA, November 1994.
[68J S. Iman, M. Pedram, "POSE: Power Optimization and Synthesis Environment," DAC-33: ACM/IEEE Design Automation Conference, pp. 21-26, Las Vegas, NV, June 1996.
[69J K. Inoue, T. Ishihara, K. Murakami, "Way-Predicting Set-Associative Cache for High-Performance and Low-Energy Consumption, ISLPED-99: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 273-275, San Diego, CA, August 1999.
[70J 1. K. John, A. Subramanian, "Design and Performance Evaluation of a Cache Assist to Implement Selective Caching," ICCD-97: IEEE International Conference on Computer Design, pp. 510-518, Austin, TX, October 1997.
[71J N. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Pre-Fetch Buffer," ISCA-90: ACM/IEEE International Symposium on Computer Architecture, pp. 364-373, Seattle, WA, May 1990.
[72J M. B. Kamble, K. Ghose, "Analytical Energy Dissipation Models for LowPower Caches," ISLPED-97: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 143-148, Monterey, CA, August 1997.
[73J G. Kane, J. Heinrich, MIPS RISC Architecture, Prentice Hall, 1994.
[74J D. Keitel-Schulz, N. Wehn, "Embedded DRAM Development: Technology, Physical Design and Application Issues," IEEE Design and Test of Computers, Vol. 18, No. 3, pp. 7-15, May-June 2001.
[75J K. Keutzer, A. Newton, J. Rabaey, A. Sangiovanni-Vincentelli, "System-Level Design: Orthogonalization of Concerns and Platform-Based Design," IEEE Transaction on CAD/ICAS, Vol. 19, No. 12, pp. 1523-1543, December 2000.
[76J D. Kim, K. Choi, "Power-Conscious High-Level Synthesis Using Loop Folding," DAC-34: ACM/IEEE Design Automation Conference, pp. 441-445, Anaheim, CA, June 1997.
[77J J. Kin, M. Gupta, W. Mangione-Smith, "The Filter Cache: An Energy Efficient Memory Structure," MICRO-30: Annual IEEE/ACM International Symposium on Microarchitecture, pp. 184-193, Research Triangle Park, NC, December 1997.
REFERENCES 135
[78] D. Kirovski, C. Lee, M. Potkonjak, W. Mangione-Smith, "Synthesis of Power Efficient Systems-on-Silicon," ASP-DAC-98: IEEE Asian and South Pacific Design Automation Conference, pp. 557-562, Yokohama, Japan, February 1998.
[79] U. Ko, P. T. Balsara, A. K. Nanda, "Energy Optimization of Multilevel Cache Architectures for RISC and CISC Processors," IEEE Transactions on VLSI Systems, Vol. 6, No. 2, pp. 299-308, June 1998.
[80] S. Komatsu, M. Ikeda, K. Asada, "Low Power Chip Interface Based on Bus Data Encoding with Adaptive Code-Book Method," GLS- VLSI-99: ACM/IEEE Great Lakes Symposium on VLSI, pp. 368-371, Ypsilanti, MI, March 1999.
[81] B. Kumthekar, I. H. Moon, F. Somenzi, "A Symbolic Algorithm for Low-Power Sequential Synthesis," ISLPED-97: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 56-61, Monterey, CA, August 1997.
[82] A. Kunimatsu, et al., "Vector Unit Architecture for Emotion Synthesis," IEEE Micro, Vol. 20, No. 2, pp. 40-47, March-April 2000.
[83] Intel, "Intel Intel XScale™ Microarchitecture Technical Summary," http://www.intel.com/design/intelxscale.
[84] K. Itoh, K. Sasaki, Y. Nakagome, "Trends in Low-Power RAM Circuit Technologies," Proceedings of the IEEE, Vol. 83, No. 4, pp. 524-543, April 1995.
[85] G. Jackson, et al., "An Analog Record, Playback and Processing System on a Chip for Mobile Communications Devices," IEEE Custom Integrated Circuits Conference, pp. 99-102, San Diego, CA, May 1999.
[86] T. Juan, T. Lang, J. J. Navarro, "Reducing TLB Power Requirements," ISLPED-97: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 196-201, Monterey, CA, August 1997.
[87] P. Laramie, Instruction-Level Power Analysis and Low Power Design Methodology of a Core Processor, Master Thesis, UC Berkeley, 1998.
[88] H. S. Lee, G. S. Tyson, "Region-Based Baching: An Energy-Delay Efficient Memory Architecture for Embedded Processors," IEEE International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 120-127, November 2000.
[89] T. C. Lee, S. Malik, V. Tiwari, M. Fujita, "Power Analysis and Minimization Techniques for Embedded DSP Software," IEEE Transactions on VLSI Systems, Vol. 5, No. 1, pp. 123-135, March 1997.
[90] H. Lekatsas, W. Wolf, "Code Compression for Low Power Embedded Systems," DAC-97: ACM/IEEE Design Automation Conference, pp. 294-299, Los Angeles, CA, June 2000.
[91]" Y. Li, J. Henkel, "A Framework for Estimating and Minimizing Energy Dissipation of Embedded HW /SW Systems," DAC-95: ACM/IEEE Design Automation Conference, pp. 188-193, San Francisco, CA, June 1998.
136 MEMORY DESIGN TECHNIQUES
[92] S. Y. Liao, S. Devadas, K. Keutzer, "Code Density Optimization for Embedded DSP Processors Using Data Compression Techniques," IEEE Transactions on CAD/ICAS, Vol. 17, No. 7, pp. 601-608, July 1998.
[93] D. Lidsky, J. Rabaey, "Low-Power Design of Memory Intensive Functions," IEEE Symposium on Low Power Electronics, pp. 16-17, San Diego, CA, September 1994.
[94] E. Macii, M. Pedram, F. Somenzi, "High-Level Power Modeling, Estimation, and Optimization," IEEE Transactions on CAD /ICAS, Vol. 17, No. 11, pp. 1061-1079, November 1998.
[95] K. Mai, et al. , "Smart Memories: A Modular Reconfigurable Architecture," ISCA-OO: ACM/IEEE International Symposium on Computer Architecture, pp. 161-171, Vancouver, BC, 2000.
[96] H. Mehta, R. M. Owens, M. J. Irwin, "Some Issues in Gray Code Addressing," GLS- VLSI-96: ACM/IEEE Great Lakes Symposium on VLSI, pp. 178-180, Ames, IA, March 1996.
[97] J. Mendl, "Low Power Microelectronics: Retrospect and Prospect," Proceedings 01 the IEEE, Vol. 83, No. 4, pp. 619-635, April 1995.
[98] V. Milutinovic, B. Markovic, M. Tomasevic, M. Tremblay, "The Split Temporal/Spatial Cache: A Complexity Analysis," SClzzL-6 Workshop, pp. 89-96, Santa Clara, CA, September 1996.
[99] J. Monteiro, S. Devadas, A. Ghosh, "Retiming Sequential Circuits for Low Power," ICCAD-93: IEEE/ACM International Conlerence on Computer-Aided Design, pp. 398-402, Santa Clara, CA, November 1993.
[100] J. Monteiro, S. Devadas, P. Ashar, A. Mauskar, "Scheduling Techniques to Enable Power Management," DAC-33: ACM/IEEE Design Automation Conlerence, pp. 349-352, Las Vegas, NV, June 1996.
[101] J. Monteiro, S. Devadas, A. Ghosh, "Sequential Logic Optimization for Low Power Using Input-Disabling Precomputation Architectures," IEEE Transactions on CAD /ICAS, Vol. 17, No. 3, pp. 279-284, March 1998.
[102] J. Monteiro, A. Oliveira, "Finite State Machine Decomposition for Low Power," DAC-35: ACM/IEEE Design Automation Conlerence, pp. 763-768, San Francisco, CA, June 1998.
[103] S. Muchnick, Advanced Compiler Design fj Implementation. Morgan Kaufmann, 1997.
[104] M. Munch, B. Wurth, R. Mehra, J. Sproch, N. Wehn, "Automatie RT-Level Operand Isolation to Minimize Power Consumption in Datapaths," DA TE-00: IEEE Design Automation and Test in Europe, pp. 624-631, Paris, France, March 2000.
[105] E. Musoll, J. Cortadella, "Scheduling and Resource Binding for Low Power," ISSS-95: IEEE International Symposium on System Synthesis, pp. 104-109, Cannes, France, April 1995.
REFERENCES 137
[106] E. Musoll, T. Lang, J. Cortadella, "Working-Zone Encoding for Reducing the Energy in Microprocessor Address Buses," IEEE Transactions on VLSI Systems, Vol. 6, No. 4, pp. 568-572, December 1998.
[107] L. Nachtergaele, F. Catthoor, C. Kulkarni, "Random-Access Data Storage Components in Customized Architectures," IEEE Design and Test of Computers, Vol. 18, No. 3, pp. 40-54, May-June 2001.
[108] D. Pan, "A Tutorial on MPEG / Audio Compression," IEEE Multimedia, Vol. 2, No. 2, pp. 60-74, Summer 1995.
[109] C. Panasik, "Overcoming Obstacles to 3G Wireless Technology", Communication System Design, Vol. 7, No. 1, January 2001.
[110] P. R. Panda, N. Dutt, A. Nicolau, "Efficient Utilization of Scratch-Pad Memories in Embedded Processors," EDTC-97: IEEE European Design and Test Conference, pp. 7-11, Paris, France, March 1997.
[111] P. Panda, N. Dutt, Memory Issues in Embedded Systems-on-Chip Optimization and Exploration, Kluwer Academic Publishers, 1999.
[112] P. Panda, N. Dutt, A. Nicolau, "On-Chip vs. Off-Chip Memory: The Data Partitioning Problem in Embedded Processor-Based Systems," ACM Transactions on Design Automation of Electronic Systems, Vol. 5, No. 3, pp. 682-704, July 2001.
[113] P. R. Panda, F. Catthor, N. D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni,A. Vandercappele, P. G. Kjeldsberg, "Data and Memory Optimization Techniques for Embedded Systems," ACM Transactions on Design Automation of Electronic Systems, Vol. 6, No. 2, pp. 149-206, April 2001.
[114] R. Panwar, D. Renneis, "Reducing the Frequency of Tag Compares for Low Power I-Cache Design," ISLPD-95: ACM/IEEE International Symposium on Low Power Design, pp. 57-62, Dana Point, CA, April 1995.
[115] C. Passerone, L. Lavagno, C. Sansoe, M. Chiodo, A. Sangiovanni, "Trade-Off Evaluation in Embedded System Design via Co-simulation," ASP-DAC-97: IEEE Asia South Pacific Design Automation Conference, pp. 291-297, Chiba, Japan, January 1997.
[116] D. Patterson, et al., "The Case for Intelligent RAM," IEEE Micro, Vol. 17, No. 2, pp. 34-44, March-April 1997.
[117] M. Powell. S. H. Yang, B. Falsafi, K. Rou, N. Vijaykumar, "Reducing Leakage in a High-Performance Deep-Submicron Instruction Cache," IEEE Transactions on VLSI Systems, Vol. 9, No. 1, pp. 77-89, February 2001.
[118] B. Prince, Semiconductor Memories, 2nd Ed., John Wiley & Sons, 1997.
[119] A. Raghunathan, S. Dey, N. Jha, "Glitch Analysis and Reduction in Register Transfer Level Power Optimization," DAC-33: ACM/IEEE Design Automation Conference, pp. 331-336, Las Vegas, NV, June 1996.
138 MEMORY DESIGN TECHNIQUES
[120] R. Rajsuman, "Design and Test of Large Embedded Memories: An Overview," IEEE Design and Test of Computers, Vol. 18, No. 3, pp. 16-27, May-June 200l.
[121] S. Ramprasad, N. Shanbhag, 1. Hajj, "Signal Co ding for Low Power: Fundamental Limits and Practical Realizations," ISCAS-98: IEEE International Symposium on Circuits and Systems, pp. 1-4, Monterey, CA, May 1998.
[122] K. Roy, S. C. Prasad, "Circuit Activity Based Synthesis for Low Power Reliable Operations," IEEE Transactions on VLSI Systems, Vol. 1, No. 4, pp. 503-513, December 1993.
[123] M. Schlett, "Trends in Embedded Microprocessor Design," IEEE Computer, Vol. 31, No. 8, pp. 44-49, August 1998.
[124] S. Segars, K. Clarke, L. Goudge, "Embedded Control Problems, Thumb and the ARM7TDMI," IEEE Micro, Vol. 15, No. 5, pp. 22-30, October 1995.
[125] S. Segars, "The ARM9 Family - High Performance Microprocessors for Embedded Applications," ICCD-98: IEEE International Conference on Computer Design, pp. 230-235, Austin, TX, October 1998.
[126] Semiconductor Industry Association, 1999 International Technology Roadmap for Semiconductors, http://public.itrs.net.
[127] A. Seznec, "A Case for Two-Way Skewed-Associative Caches," ISCA-93: ACM/IEEE International Symposium on Computer Architecture, pp. 169-178, San Diego, CA, May 1993.
[128] Y. Shin, S.-K. Chae, K. Choi, "Partial Bus-Invert Co ding for Power Optimization of System-Level Buses," ISLPED-98: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 127-129, Monterey, CA, August 1997.
[129] W. Shiue, C. Chakrabarti, "Memory Exploration for Low Power, Embedded Systems," DAC-36: ACM/IEEE Design Automation Conference, pp. 140-145, New Orleans, LA, June 1999.
[130] R. Siegmund, C. Kretzschmar, D. Müller, "Adaptive Partial Bus Invert for Power Efficient Data Transfer over Wide System Buses," SECCI-OO: Symposium on Integrated Circuit and System Design, pp. 371-376, Manaus, Brazil, August 2000.
[131] M. Srivastava, A. P. Chandrakasan, R. W. Brodersen, "Predictive System Shutdown and Other Architectural Techniques for Energy Efficient Programmable Computation," IEEE Transactions on VLSI Systems, Vol. 4, No. 1, pp. 42-55, March 1996.
[132] M. Stan, W. Burleson, "Bus-Invert Coding for Low-Power 1/0," IEEE Transactions on VLSI Systems, Vol. 3, No. 1, pp. 49-58, January 1995.
[133] M. Stan, W. Burleson, "Low-Power Encodings for Global Communication in CMOS VLSI," IEEE Transactions on VLSI Systems, Vol. 5, No. 4, pp. 444-455, December 1997.
REFERENCES 139
[134J C. L. Su, C. Y. Tsui, A. M. Despain, "Saving Power in the Control Path of Embedded Processors," IEEE Design and Test 0/ Computers, Vol. 11, No. 4, pp. 24-30, Winter 1994.
[135J C. L. Su, A. M. Despain, "Cache Design Trade-Offs for Power and Performance Optimization: A Case Study," ISLPD-95: ACM/IEEE International Symposium on Low Power Design, pp. 63-68, Dana Point, CA, April 1995.
[136J M. Suzuoki, et al., "A Microprocessor with a 128-bit CPD, Ten Floating-Point MACs, Four Floating-Point Dividers and an MPEG-2 Decoder," IEEE Journal 0/ Solid-State Circuits, Vol. 34, No. 11, pp. 1608-1618, November 1999.
[137J M. Takahashi, et al., "A 60-MHz 240-m W MPEG-4 Videophone LSI with 16-Mb embedded DRAM," IEEE Journal 0/ Solid-State Circuits, Vol. 35, No. 11, pp. 1713-1721, November 2000.
[138J V. Tiwari, S. Malik, A. Wolfe, "Power Analysis ofEmbedded Software: A First Step Towards Software Power Minimization," IEEE Transactions on VLSI Systems, Vol. 2, No. 4, pp. 437-445, December 1994.
[139] V. Tiwari, S. Malik, P. Ashar, "Guarded Evaluation: Pushing Power Management to Logic Synthesis/Design," IEEE Transactions on CAD /ICAS, Vol. 17, No. 10, pp. 1051-1060, November 1998.
[140] H. V. Tran, et al., "A 2.5-V, 256-Level Nonvolatile Analog Storage Device using EEPROM Technology," IEEE International Solid-State Circuits Con/erence, pp. 270-271, San Francisco, CA, February 1996.
[141] C. Y. Tsui, M. Pedram, A. M. Despain, "Technology Decomposition and Mapping Targeting Low Power Dissipation," DAC-30: ACM/IEEE Design Automation Con/erence, pp. 68-73, Dallas, TX, June 1993.
[142] C. Y. Tsui, M. Pedram, A. M. Despain, "Low Power State Assignment Targeting Two- and Multi-Level Logic Implementations, ICCAD-94: IEEE/ACM International Con/erence on Computer-Aided Design, pp. 82-87, San Jose, CA, November 1994.
[143] DMC, Embedded 6T Static RAM Macros Datasheet, http://www.urnc.com. 1999.
[144] Virage Logic, Custom-Touch Memory Compiler Datasheet, http://www.viragelogic.com. 1999.
[145] E. Vittoz, "Low Power Microelectronics: Ways to Approach the Limits," International Con/erence on Solid-State Circuits, pp. 14-18, San Francisco, CA, January 1994.
[146J S. J. Walsh, J. A. Board, "Pollution Control Caching," ICCD-95: IEEE International Con/erence on Computer Design, pp. 300-306, Austin, TX, October 1995.
[147] S. J. E. Wilton, N. P. Jouppi, "CACTI: An Enhanced Cache Access and Cyc1e Time Model," IEEE Journal 0/ Solid-State Circuits, Vol. 31, No. 5, pp. 677-687, May 1996.
140 MEMORY DESIGN TECHNIQUES
[148] Y. Yoshida, B. Song, H. Okuhata, T. Onoye, 1. Shirakawa, "An Object Code Compression Approach to Embedded Processors," ISLPED-97: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 265-268, Monterey, CA, August 1997.
[149] K. Yoshikawa, "Embedded Flash Memories - Technology Assessment and Future," IEEE International Symposium on VLSI Technology, Systems and Applications, pp. 183-186, Taipei, Taiwan, June 1999.
[150] V. Zyuban, P. Kogge, "The Energy Complexity of Register Files," ISLPED-98: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 305-310, Monterey, CA, August 1998.
Index
AAC LC, 31 ADPCM,33 AGP, 17 ALU, 13 AMBA,18 AMR, 31, 33 ARM, 17, 48, 83, 88
ARM7TDMI, 104, 111 Thumb,104
ARMulator, 89, 96 ASIC,13
design, 14, 16 market, 14
ASM,69 Access profile, 74 Adaptive encoding, 49 Address decoder, 72 Annex cache, 45 Application-specific memory, 69 Average power, 4 Back-end fiow, 73 Bandwith, 46
optimization, 46 Basic block, 46, 49, 119 Battery
life-time, 4 Bin-packing, 85 Bipartitioning, 75 Block halos, 85 Blue Logic, 15 Branch instruction, 107 Branch target, 108 Bulfer
compressed instruction, 111 non-critical, 45 pre-decoded, 46 scratch-pad, 46 speculative, 45
Bus invert (BI), 49 clustered, 49
partitioned, 49 Bus, 7
address, 49, 87 data,87 encoding, 7, 48 energy, 99
CAS, 39 CIB,111 CMOS, 4-5
variable threshold, 32 CPU, 88
core, 89 Cache
annex, 45 associativity, 42 column associative, 50 hit rate, 39 hit ratio, 45 line size, 42 laap,46 miss rate, 50 replacement policy, 42 skewed-associative, 50 spatial, 44 sub-banking, 43 traffic-efficient, 47 victim,45 way-predicting, 45
Clock,4 frequency, 4
Clock-gating, 9 Code
compression, 47 density, 47 reordering, 46
Cadebaok-based encoding, 50 Column-associative cache, 50 Compression, 47
141
ratio, 105, 108 schemes, 105
142
Computer-aided design (CAD), 8 Conflict misses, 50 Coprocessor, 27 Core processor, 10, 40, 73 Correlation, 48
spatial, 44 spatio-temporal, 49 temporal, 44
Critical path, 88 DCT,32 DLX,104 DMA, 29, 32, 35 DRAM, 13, 15, 39
Rambus,29 embedded, 20, 23, 26, 31
DSPs,7 Data encoding, 7 Data
compression, 37 encoding, 48 transfer optimization, 48
Decompression unit, 109 Deep-submicron (DSM), 73 Design closure, 73 Design
cycles, 14 high-level, 5 system-level, 6
Dynamic access profile, 71, 73 EEPROM, 23, 25 Electronic Design Automation (EDA), 17,
71 tools, 17
Embedded DRAM, 23 Embedded SRAM, 42 Embedded system, 6-7, 9 Embedded
SRAM,70 application, 10, 69, 72-73, 91 memory, 19 processors, 2 software, 8
Embedded-system real-time, 69
Emotion engine, 27 Encoding,48
adaptive, 49 bus-invert, 49 codebook-based, 50
Energy management, 7, 9 Energy,4
bus energy, 104 fetch, 114 instruction decompression energy, 104 model, 74 optimization, 102
Energy-aware scheduling, 9
MEMORY DESIGN TECHNIQUES
Entropy,49 FGMOS, 25, 33 FIFO,31 FLASH, 23, 25, 33, 104, 117 FMAC,29 FSM decomposition, 9 Finite state machine (FSM), 9 Flat memory, 37 Floating-gate transistor, 25 Floating-point, 27 Floorplan, 23, 73 Floorplanning, 82, 84 Foundry verified qualification, 17 GFLOPS,30 Gate freezing, 9 Gate resizing, 9 Glitch,9
filtering, 9 Gray code, 49 Guarded evaluation, 9 HW /SW partitioning, 18 Hamming distance, 49 Hard macros, 17, 71 Hardware prefetching, 47 Hardware synthesis, 8 High-level design, 5 Hit rate, 39 IBM, 15 IDT, 101, 111 ILP,46 IP, 11, 16
qualification, 17 vendors,17
ITU H.223,31
ITU-T G.726,35
Insertion sort, 96 Instruction compression, 11 Instruction, 99
compression, 104 compression, 99 decompression table, 101 decompression, 104 fetching/decompression logic, 101
Instruction-level parallelism, 46 Instruction-level simulator, 71, 74 Intel, 13 Intellectual property (IP), 11, 16 Kernel extraction, 9 LSI,13 Layout, 70, 82
valid, 89 Load capacitance, 4 Locality
temporal, 37 Logical partitioning, 43
INDEX
Loop cache, 46 MIPS, 28, 100, 104
DLX,l11 R4000, 100
MPEG,70 MPEG2,30 MPEG4,30 MUX
output, 84 Mark, 104, 108, 119 Memory generator, 22 Memory
access trace, 89 application-specific, 69 architecture, 10 bandwidth, 19, 46 cut, 74, 84, 89, 92 dedieated-process, 20 embedded, 19 energy, 9, 74, 99 fetch energy, 104 flat, 37 generator, 22, 74, 83-84 hierarchy design, 40 hierarchy, 38 interface optimization, 48 interface, 48 latency, 46-47 market,14 non volatile, 20 non-volatile, 25 partitioning, 10, 43 process-compatible, 20 processor interface, 99 read energy, 104 select signal, 83 traflie, 11, 47, 105, 114 usage, 105, 116 volatile, 20
Microcontrollers, 7, 13 Moore's law, 2, 13 Multi-chip modules (MCM), 25 NMOS, 5,21 Non-critieal buffer, 45 OS, 126 Operand isolation, 9 Operating system, 126 Over-the-cell routing, 22 PCB, 19, 25 PCI,17 PCM,35 PMI,46 PMOS,21 Package,3 Parasities, 89 Partitioning
logieal,43
physieal, 43 Peak power, 4 Physieal design, 70 Physical partitioning, 43 Place and Route (P&R), 73, 82, 84-85 Placement and routing (P&R), 84 Placement, 73, 85
automatie, 85 floorplan-directed, 85 legal, 85
Platform-based design, 17 Playstation 2, 27 Power
average, 4 distribution, 84, 87 management, 7, 9 metries, 3 peak, 4 short-circuit, 5 switching, 4
Power-delay product, 4 PowerPC,17 Pre-decoded instruction buffers, 46 Pre-silicon qualification, 17 Precomputation, 9 Prefetching, 47 Processor
core, 10, 40, 69, 71, 73-74 market,14
Production rating, 17 Profiling, 100 Ptolemy,92 QCIF,33 RAM macro compilers, 71 RAM,104 RAS, 39 RISC, 27-28, 31, 48 ROM, 15, 23, 104, 117 Rambus,29 Random White Noise (RWN), 48 Real-time embedded systems, 69 Region-based caching, 44 Register files, 42 Resource allocation, 9 Retiming,9 Routing, 73, 86
block, 84 cell, 84
SIMD,28 SRAM, 13, 15, 70-71, 104
design view, 22 embedded, 42, 70 frame view, 22 generator, 83 low-leakage, 20 on-chip, 73-74 power view, 22
143
144
segmented, 44 ST, 83, 89 STG restructuring, 9 Scheduling
energy-aware, 9 Scratch-pad buffer, 46 Scratch-pad memory, 46 Segmented SRAM, 44 Sense amplifiers, 22 Shift registers, 13 Short-circuit power, 5 Signal integrity, 4 Silicon fabs, 17 Skewed-associative cache, 50 Sleep mode, 43 Soft macros, 71 Sony, 27 Spatial cache, 44 Spatio-temporal correlation, 49 Speculative buffer, 45 State re-encoding, 9 State transition graph (STG), 9 Super-block, 120 Supply voltage, 4 Switching activity, 4, 85, 87, 92 Switching power, 4 Synthesis, 8
logic, 89
MEMORY DESIGN TECHNIQUES
physical, 89 System
architecture, 7 design, 8
System-level design, 6 System-level exploration, 91 System-on-Chip (SoC), 5, 10-11, 15,69-70 TLB,42 Temporal cache, 44 Temporallocality, 37 Temporal
cache, 44 Thumb instruction set, 48 Toggle count, 92 Toshiba,27 Transformations
computation, 40 Translation look-aside buffer, 42 Twin-VQ,31 VLIW processor, 47 Variable supply voltage, 8 Vector processing unit (VPU), 27 Verilog, 83-84, 89 Victim cache, 45 Way-predicting cache, 45 Wire length, 85-86 Tapeout, 14 X86,13