ai for architecture - oregon state university

AI for Architecture:

Principles and Prospects

for the Next Paradigm

College of Engineering System Technology and Architecture Research (STAR) Lab

Drew Penney and Lizhong Chen

Motivating the Next Paradigm

Opposing trends in past decade• Machine learning boom• Moore’s law bust

Machine learning supplants Moore’s law• How do we close the loop?

Agenda

ML Background

Literature review• How has machine learning been applied?

Analysis of current practice• What strategies are most effective?

Future work• Where do we go from here?

ML Background

Fundamental applicability of ML• Powerful, yet generic mathematical framework• Task specification adaptation

- IPC prediction example

ML Background

Diverse learning approaches & models• Supervised• Unsupervised• Semi-supervised• Reinforcement learning

ML Background: Supervised Learning



Decision trees• Tree structure

- Node = feature- Branch = feature value(s)

• Simple, sequential, low overhead


Bayesian networks• Conditional relationships

- Node = random variable- Edge = conditional dependence

• Scale w/ features


Support Vector Machines (SVMs)• Optimize prediction margin• Simplest case linear

Use “kernel trick” for non-linear


Neural networks• Perceptron

(one layer)• Deep neural network

(fully connected)• Convolutional neural network

(spatial aware convolution layers)• Recurrent neural network

(output → input loops)

ML Background: Unsupervised Learning


ML Background: Semi-supervised Learning


ML Background: Reinforcement Learning


ML Background: Reinforcement Learning

Q-Learning• Approximate optimal actions w/ stored

action-value pairs• Table-based

Deep Q-Learning• Approximate optimal actions w/ neural

network• Weight storage

Agenda

ML Background




Literature Review: Overview

Application topics• System simulation• GPUs• Memory systems & branch prediction• Networks-on-Chip• System-level optimization• ML-enabled approximate computing

Literature Review: System Simulation

Reduced execution time vs cycle-accurate• Several orders of magnitude faster• Small accuracy penalty

Ipek [8], Agarwal [86]

Mechanistic-empirical models• ML + other model types – easier & better

Eyerman [10]

Cross-architecture predictions• Predict any architecture using available HW

Zheng [11, 12]

Design space exploration• Highly irregular scaling – no problem

Wu [14], Jia [13], Jooya [15], Lin [16]

Cross-platform prediction (CPU → GPU)• Save development time for important work

Baldini [17], Ardalani [18] & [87]

Literature Review: GPUs

Scheduling• High performance heterogeneous architecture

Pattnaik [20]

Traffic pattern characterization• Automatically identify prevalent patterns

Li [90]

Literature Review: GPUs

Caches (prefetch & re-use)• Effective modeling for complex patterns

Peled [21], Wang [24], Zeng & Guo [22], Teran [23], Braun & Litz [84], Bhatia [92]

Schedulers & control • High-performance under constraints

Ipek [25], Deng [29], Mukundan & Martinez [26], Ipek [27], Yoo [28], Yoo [30]

Branch prediction• State-of-the-art accuracy

St. Amant [31], Jimenez [32], Tarsa [85], Garza [93]

Literature Review: Memory & Branch Prediction

DVFS & link control• Optimal proactive control

Fettes [38], Savva [33], DiTomaso [34], Winkle [35], Rez [36], Clark [37]

Flow control • Reduce latency and increase efficiency

Daya [39], Yin [40]

Literature Review: Networks on Chip

Topology & general design• Efficient exploration in vast design space

Das [41], Joardar [43], Lin [44] & [91], Das [42], Rao [45]

Reliability • Find optimal balance in diverse policies

Wang [49] & [88], DiTomaso [48]

Literature Review: Networks on Chip

Energy efficiency optimization• Significant energy reduction• Minimal performance reduction

Won[50], Bai[55], Pan[51], Bailey[52], Lo[53], Mishra[54],Chen & Marculescu[57], Chen[58], Imes[59], Tarsa [89]

Task allocation & resource management • Consideration for long-term impact

Lu[60], Jain[65], Nemirovsky[61], Zhang[62], Bitirgen[63], Wang[64], Ding [90]

Literature Review: System Level Optimization

Chip layout • Replace standard design practice

Wu[66]

Literature Review: System Level Optimization

Function approximation• Reduce energy & execution time• Small quality penalty

Esmaeilzadeh [67], Yazdanbakhsh [68], Grigorian [69], Oliveira [71]

Statistical guarantees• User controlled trade-offs, guaranteed quality

Mahajan [70]

Literature Review: ML Enabled Approx Computing

Agenda

ML Background

Literature review

• How has machine learning been applied?

Analysis of current practice

• What strategies are most effective?

Future work

• Where do we go from here?

Analysis of Current Practice

Two general categories

• Online ML application (ML integrated)• Integrate ML at runtime• Limited by practical constraints

• Offline ML application (ML supported)• Support architecture during design• Generally higher complexity

Current Practice: Online Applications

Model selection• Primarily supervised or RL• Some tasks can use either (w/ limitations)

[38]

• Some tasks supervised only

Implementation & overhead• Dedicated vs opportunistic data collection

[39], [50]

• Hardware vs software[67], [50]

• Hardware trade-offs[33], [31], [25], [26]


Optimization• Mitigate online learning side effects

- Update model, not system[21]

- Initial alternate controller[50]


Model/feature selection• Not limited by hardware constraints →

substantial diversity• Design space exploration

- Iterative search [41], [42], [43], [44], [91]

- Standard prediction [15], [45], [47]

• Some tasks supervised only

Current Practice: Offline Applications

Optimization• Improve data efficiency & model accuracy• Ensembles

- Subset choices & outlier removal[15], [18]

• Sampling (avoid systematic bias)[47]

Current Practice: Offline Applications

Mechanistic-empirical• Simple & avoid assumptions• High accuracy

[10], [45]

Task specific considerations• Complex feature handling

[29]

• Result interpretation[18]

Current Practice: Domain Knowledge

Agenda

ML Background




Future Work: Implementation

New strategies → effective application• Pruning

- Train complex models- Implement simple models

Future Work: Improvements

New models & architecture-aware techniques• Hierarchical models

- Model high & low level details• Phase-level & nanosecond scale

- Finer-grain prediction/control

Future Work: Tools

Tools• Ideas limited by application complexity• General purpose framework →

More accessible implementation

Future Work: Applications

Applications• Extend existing approaches

- Emerging technologies & architectures- System-level approximate computing

• Long-term potential- System-wide co-optimization- Automated design

Conclusion

Broad applicability

Many opportunities

Future potential - automated architecture

References Cited

[8] E. Ipek, S. A. McKee, B. R. de Supinski, M. Schulz, and R. Caruana, “Efficiently exploring architectural design spaces via predictive modeling,” in International Conference on Architectural Support for Programming Languages and Operation Systems, Oct. 2006.

[10] S. Eyerman, K. Hoste, and L. Eeckhout, “Mechanistic-empirical processor performance modeling for constructing cpi stacks on real hardware,” in International Symposium on Performance Analysis of Systems and Software, Apr. 2011.

[11] X. Zheng, P. Ravikumar, L. K. John, and A. Gerstlauer, “Learning-based analytical cross-platform performance prediction,” in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, July 2015.

[12] X. Zheng, L. K. John, and A. Gerstlauer, “Accurate phase-level cross-platform power and performance estimation,” in Design Automation Conference, June 2016.

[13] W. Jia, K. A. Shaw, and M. Martonosi, “Stargazer: Automated regression-based gpu design space exploration,” in International Symposium on Performance Analysis of Systems and Software, Apr. 2012.

[14] G. Wu, J. L. Greathouse, A. Lyashevsky, N. Jayasena, and D. Chiou, “Gpgpu performance and power estimation using machine learning,” in International Symposium on High Performance Computer Architecture, Feb. 2015.

[15] A. Jooya, N. Dimopoulos, and A. Baniasadi, “Multiobjective gpu design space exploration optimization,” in International Conference on High Performance Computing % Simulation, July 2016.

[16] T.-R. Lin, Y. Li, M. Pedram, and L. Chen, “Design space exploration of memory controller placement in throughput processors with deep learning,” in IEEEComputer Architecture Letters, vol. 18, Mar. 2019.

[17] I. Baldini, S. J. Fink, and E. Altman, “Predicting gpu performance from cpu runs using machine learning,” in International Symposium on Computer Architecture and High Performance Computing, Oct. 2014.

[18] N. Ardalani, C. Lestourgeon, K. Sankaralingam, and X. Zhu, “Cross-architecture performance prediction (xapp) using cpu code to predict gpu performance,” in International Symposium on Microarchitecture, June 2015

[20] A. Pattnaik, X. Tang, A. Jog, O. Kayran, A. K. Mishra, M. T. Kandemir, O. Mutlu, and C. R. Das, “Scheduling techniques for gpu architectures with processing-in-memory capabilities,” in International Conference on Parallel Architecture and Compilation Techniques, Sept. 2016

References Cited

[21] L. Peled, S. Mannor, U. Weiser, and Y. Etsion, “Semantic locality and context-based prefetching using reinforcement learning,” in International Symposium on High Performance Computer Architecture, June 2015.

[22] Y. Zeng and X. Guo, “Long short term memory based hardware prefetcher,” in Proceedings of the International Symposium on Memory Systems, Oct. 2017.

[23] E. Teran, Z. Wang, and D. A. Jimenez, “Perceptron learning for reuse prediction,” in International Symposium on Microarchitecture, Oct. 2016.

[24] H. Wang, X. Yi, P. Huang, B. Cheng, and K. Zhou, “Efficient ssd caching by avoiding unnecessary writes using machine learning,” in International Conference on Parallel Processing, Aug. 2018.

[25] E. Ipek, O. Mutlu, J. F. Martinez, and R. Caruana, “Self-optimizing memory controllers: A reinforcement learning approach,” in International Symposium on High Performance Computer Architecture, June. 2008.

[26] J. Mukundan and J. F. Martinez, “Morse: Multi-objective reconfigurable self-optimizing memory scheduler,” in International Symposium on High-Performance Comp Architecture, Feb. 2012.

[27] S. Wang and E. Ipek, “Reducing data movement energy via online data clustering and encoding,” in International Symposium on Microarchitecture, Oct. 2016.

[28] W. Kang and S. Yoo, “Dynamic management of key states for reinforcement learning-assisted garbage collection to reduce long tail latency in ssd,” in Design Automation Conference, June 2018.

[29] Z. Deng, L. Zhang, N. Mishra, H. Hoffman, and F. T. Chong, “Memory cocktail therapy: A general learning-based framework to optimize dynamic tradeoffs in nvms,” in International Symposium on Microarchitecture, Oct. 2017.

[30] J. Xiao, Z. Xiong, S. Wu, Y. Yi, H. Jin, and K. Hu, “Disk failure prediction in data centers via online learning,” in International Conference on Parallel Processing, June 2018.

[31] R. S. Amant, D. A. Jimenez, and D. Burger, “Low-power, high-performance analog neural branch prediction,” in International Symposium on Microarchitecture, Nov. 2008.

[32] D. A. Jimenez, “An optimized scaled neural branch predictor,” in International Conference on Computer Design, Oct. 2011.

References Cited

[33] A. G. Savva, T. Theocharides, and V. Soteriou, “Intelligent on/off dynamic link management for on-chip networks,” in Journal of Electrical and Computer Engineering, Jan 2012.

[34] D. DiTomaso, A. Sikder, A. Kodi, and A. Louri, “Machine learning enabled power-aware network-on-chip design,” in Design, Automation and Test in Europe, Mar. 2017.

[35] S. V. Winkle, A. Kodi, R. Bunescu, and A. Louri, “Extending the power-efficiency and performance of photonic interconnects for heterogeneous multicores with machine learning,” in International Symposium on High Performance Computer Architecture, Feb. 2018.

[36] M. F. Reza, T. T. Le, B. De, M. Bayoumi, and D. Zhao, “Neuro-noc: Energy optimization in heterogeneous many-core noc using neural networks in dark silicon era,” in International Symposium on Circuits and Systems, May 2018.

[37] M. Clark, A. Kodi, R. Bunescu, and A. Louri, “Lead: Learning-enabled energy-aware dynamic voltage/frequency scaling in nocs,” in Design Automation Conference, June 2018.

[38] Q. Fettes, M. Clark, R. Bunescu, A. Karanth, and A. Louri, “Dynamic voltage and frequency scaling in nocs with supervised and reinforcement learning techniques,” IEEE Transactions on Computers, vol. 68, Mar. 2019.

[39] B. K. Daya, L.-S. Peh, and A. P. Chandrakasan, “Quest for high-performance bufferless nocs with single-cycle express paths and self-learning throttling,” in Design Automation Conference, June 2016.

[40] J. Yin, Y. Eckert, S. Che, M. Oskin, and G. H. Loh, “Toward more efficient noc arbitration: A deep reinforcement learning approach,” in AI-assisted Design for Architecture, June. 2018.

[41] S. Das, J. R. Doppa, D. H. Kim, P. P. Pande, and K. Chakrabarty, “Optimizing 3d noc design for energy efficiency: A machine learning approach,” in International Conference on Computer-Aided Design, Nov. 2015.

[42] S. Das, J. R. Doppa, P. P. Pande, and K. Chakrabarty, “Energy-efficient and reliable 3d network-on-chip (noc): Architectures and optimization algorithms,” in International Conference on Computer-Aided Design, Nov. 2016.

References Cited

[43] B. K. Joardar, R. G. Kim, J. R. Doppa, P. P. Pande, D. Marculescu, and R. Marculescu, “Learning-based application-agnostic 3d noc design for heterogeneous manycore systems,” IEEE Transactions on Computers, vol. 68, June 2019.

[44] T.-R. Lin, D. Penney, M. Pedram, and L. Chen, “Optimizing routerless network-on-chip designs: an innovative learning-based framework,” May 2019. arXiv:1905.04423.

[45] N. Rao, A. Ramachandran, and A. Shah, “Mlnoc: A machine learning based approach to noc design,” in International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Sept. 2018.

[47] K. Sangaiah, M. Hempstead, and B. Taskin, “Uncore rpd: Rapid design space exploration of the uncore via regression modeling,” in International Conference on Computer-Aided Design, Nov. 2015.

[48] D. DiTomaso, T. Boraten, A. Kodi, and A. Louri, “Dynamic error mitigation in nocs using intelligent prediction techniques,” in International Symposium on Microarchitecture, Oct. 2016.

[49] K. Wang, A. Louri, A. Karanth, and R. Bunescu, “High-performance, energy-efficient, fault-tolerant network-on-chip design using reinforcement learning,” in Design, Automation and Test in Europe, Mar. 2019.

[50] J.-Y. Won, X. Chen, P. Gratz, J. Hu, and V. Soteriou, “Up by their bootstraps: Online learning in artificial neural networks for cmp uncore power management,” in International Symposium on High Performance Computer Architecture, Feb. 2014.

[51] G.-Y. Pan, J.-Y. Jou, and B.-C. Lai, “Scalable power management using multilevel reinforcement learning for multiprocessors,” in Transactions on Design Automation of Electronic Systems, Aug. 2014.

[52] P. E. Bailey, D. K. Lowenthal, V. Ravi, B. Rountree, M. Schulz, and B. R. de Supinski, “Adaptive configuration selection for power-constrained heterogeneous systems,” in International Conference on Parallel Processing, Sept. 2014.

[53] D. Lo, T. Song, and G. E. Suh, “Prediction-guided performance-energy trade-off for interactive applications,” in International Symposium on Microarchitecture, Dec. 2015.

References Cited

[54] N. Mishra, J. D. Lafferty, and H. Hoffman, “Caloree: Learning control for predictable latency and low energy,” in International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2018.

[55] Y. Bai, V. W. Lee, and E. Ipek, “Voltage regulator efficiency aware power management,” in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2017.

[57] Z. Chen and D. Marculescu, “Distributed reinforcement learning for power limited many-core system performance optimization,” in Design, Automation and Test in Europe, Mar. 2015.

[58] Z. Chen, D. Stamoulis, and D. Marculescu, “Profit: Priority and power/performance optimization for many-core systems,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, pp. 2064–2075, Oct. 2018.

[59] C. Imes, S. Hofmeyr, and H. Hoffman, “Energy-efficient application resource scheduling using machine learning classifiers,” in International Conference on Parallel Processing, Aug. 2018.

[60] S. J. Lu, R. Tessier, and W. Burleson, “Reinforcement learning for thermal-aware many-core task allocation,” in Proceedings of the 25th edition on Great Lakes Symposium on VLSI, May 2015.

[61] D. Nemirovsky, T. Arkose, N. Markovic, M. Nemirovsky, O. Unsal, and A. Cristal, “A machine learning approach for performance prediction and scheduling on heterogeneous cpus,” in International Symposium on Computer Architecture and High Performance Computing, Oct. 2017.

[62] H. Zhang, B. Tang, X. Geng, and H. Ma, “Learning driven parallelization for large-scale video workload in hybrid cpu-gpu cluster,” in International Conference on Parallel Processing, Aug. 2018.

[63] R. Bitirgen, E. Ipek, and J. F. Martinez, “Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach,” in International Symposium on Microarchitecture, Nov. 2008.

[64] W. Wang, J. W. Davidson, and M. L. Soffa, “Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale numamachines,” in International Symposium on High Performance Computer Architecture, Mar. 2016.

References Cited

[65] R. Jain, P. R. Panda, and S. Subramoney, “Machine learned machines: Adaptive co-optimization of caches, cores, and on-chip network,” in Design, Automation and Test in Europe, Mar. 2016.

[66] G. Wu, Y. Xu, D. Wu, M. Ragupathy, Y. yen Mo, and C. Chu, “Flip-flop clustering by weighted k-means algorithm,” in Design Automation Conference, June 2016.

[67] H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, “Neural acceleration for general-purpose approximate programs,” in International Symposium on Microarchitecture, Dec. 2012.

[68] A. Yazdanbakhsh, J. Park, H. Sharma, P. Lotfi-Kamran, and H. Esmaeilzadeh, “Neural acceleration for gpu throughput processors,” in International Symposium on Microarchitecture, Dec. 2015.

[69] B. Grigorian, N. Farahpour, and G. Reinman, “Brainiac: Bringing reliable accuracy into neurally-implemented approximate computing,” in International Symposium on High Performance Computer Architecture, Feb. 2015.

[70] D. Mahajan, A. Yazdanbaksh, J. Park, B. Thwaites, and H. Esmaeilzadeh, “Towards statistical guarantees in controlling quality tradeoffs for approximate acceleration,” in International Symposium on Computer Architecture, June 2016.

[71] G. F. Oliveira, L. R. Goncalves, M. Brandalero, A. C. S. Beck, and L. Carro, “Employing classification-based algorithms for general-purpose approximate computing,” in Design Automation Conference, June 2018

[84] P. Braun and H. Litz, “Understanding Memory Access Patterns for Prefetching,” in International Workshop on AI-assisted Design for Architecture, June 2019.

[85] S. Tarsa, C.-K. Lin, G. Keskin, G. Chinya, and H. Wang, “Improving Branch Prediction by Modeling Global History with Convolutional Neural Networks,” in International Workshop on AI-assisted Design for Architecture, June 2019.

[86] N. Agarwal, T. Jain, and M. Zahran, “Performance Prediction for Multi-threaded Applications,” International Workshop on AI-assisted Design for Architecture, June 2019.

[87] N. Ardalani, U. Thakker, A. Albarghouthi, and K. Sankaralingam, “A Static Analysis-based Cross-Architecture Performance Prediction Using Machine Learning,” International Workshop on AI-assisted Design for Architecture, June 2019.

References Cited

[88] K. Wang, A. Louri, A. Karanth, R. Bunescu, “IntelliNoC: A Holistic Design Framework for Energy-Efficient and Reliable On-Chip Communication for Manycores,” International Symposium on Computer Architecture, June 2019.

[89] S. Tarsa, R. B. R. Chowdhury, J. Sebot, G. Chinya, J. Gaur, K. Sankaranarayanan, C.-K. Lin, R. Chappell, R. Singhal, and H. Wang, “Post-Silicon CPU Adaptation Made Practical Using Machine Learning,” International Symposium on Computer Architecture, June 2019.

[90] Y. Ding, N. Mishra, and H. Hoffman, “Generative and Multi-phase Learning for Computer Systems Optimization,” International Symposium on Computer Architecture, June 2019.

[91] T.-R. Lin, D. Penney, M. Pedram, and L. Chen, “A Deep Reinforcement Learning Framework for Architectural Exploration: A Routerless NoC Case Study,” International Symposium on High-Performance Computer Architecture, Feb. 2020.

[92] E. Bhatia, G. Chacon, S. Pugsley, E. Teran, P. V. Gratz, and D. Jimenez, “Perceptron-Based Prefetch Filtering,” International Symposium on Computer Architecture, June 2019.

[93] E. Garza, S. Mirbagher-Ajorpaz, T. A. Khan, and D. Jimenez, “Bit-level Perceptron Prediction for Indirect Branches,” International Symposium on Computer Architecture, June 2019.

Additional References

[1] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. The MIT Press, 2016.

[2] S. Kotsiantis, “Supervised machine learning: A review of classification techniques,” in Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real World AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, pp. 3–24, 2007.

[3] N. Mishra, H. Zhang, J. D. Lafferty, and H. Hoffman, “A probabilistic graphical model-based approach for minimizing energy under performance constraints,” in

International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2015.

[4] J. Shlens, “A tutorial on principal component analysis,” 2014. arXiv:1404.1100.

[5] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 2nd ed., 1998.

[6] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” The Journal of Machine Learning Research, vol. 3, pp. 1157–1182, Mar. 2003.

[7] J. Li, K. Chen, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu, “Feature selection: A data perspective,” ACM Computing Surveys, vol. 50, Jan. 2018.

[9] B. Ozisikyilmaz, G. Memik, and A. Choudhary, “Machine learning models to predict performance of computer system design alternatives,” in InternationalConference on Parallel Processing, Sept. 2008.

[19] K. O’Neal, P. Brisk, E. Shriver, and M. Kishinevsky, “Halwpe: Hardware-assisted light weight performance estimation for gpus,” in Design Automation Conference, June 2017.

[46] Z. Qian, D.-C. Juan, P. Bogdan, C.-Y. Tsui, D. Marculescu, and R. Marculescu, “Svr-noc: A performance analysis tool for network-on-chips using learning-based support vector regression model,” in Design Automation and Test in Europe, Mar. 2013.

[56] M. Allen and P. Fritzsche, “Reinforcement learning with adaptive kanerva coding for xpilot game ai,” in IEEE Congress of Evolutionary Computation, June 2011.


[72] F. N. Taher, J. Callenes-Sloan, and B. C. Schafer, “A machine learning based hard fault recuperation model for approximate hardware accelerators,” in Design Automation Conference, June 2018.

[73] X. Chen, Z. Xu, H. Kim, P. Gratz, J. Hu, M. Kishinevsky, and U. Ogras, “In-network monitoring and control policy for dvfs of cmp networks-on-chip and last level caches,” in International Symposium on Networks-on-Chip, May 2012.

[74] R. Sutton, “Generalization in reinforcement learning: Successful examples using sparse coarse coding,” in International Conference on Neural Information Processing Systems, June 1996.

[75] J. A. Boyan and A. W. Moore, “Learning evaluation functions to improve optimization by local search,” The Journal of Machine Learning Research, Sep. 2001.

[76] P. Bratley and B. L. Fox, “Algorithm 659: Implementing sobol’s quasirandom sequence generator,” ACM Transactions on Mathematical Software, vol. 14, Mar. 1988.

[77] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural networks,” Oct. 2015. arXiv:1506.02626.

[78] D. C. Mocanu, E. Mocanu, P. Stone, P. H. Nguyen, M. Gibescu, and A. Liotta, “Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science,” Nature Communications, vol. 9, June 2018. DOI: 10.1038/s41467-018-04316-3.

[79] T. Sherwood, E. Perelman, and B. Calder, “Basic block distribution analysis to find periodic behavior and simulation points in applications,” in International Conference on Parallel Architectures and Compilation Techniques, Sept. 2001.

[80] R. M. Kretchmar, “Reinforcement learning algorithms for homogenous multi-agent systems,” in Workshop on Agent and Swarm Programming, 2003.

[81] R. Boyapati, J. Huang, P. Majumder, K. H. Yum, and E. J. Kim, “Approx-noc: A data approximation framework for network-on-chip architectures,” in International Symposium on Computer Architecture, June 2017.

[82] A. Raha and V. Raghunathan, “Towards full-system energy-accuracy tradeoffs: A case study of an approximate smart camera system,” in Design Automation Conference, June 2017.

[83] A. Sampson, A. Baixo, B. Ransford, T. Moreau, J. Yip, L. Ceze, and M. Oskin, “Accept: A programmer-guided compiler framework for practical approximate computing,” University of Washington Technical Report, vol. 1, Jan. 2015.


[94] A. Margaritov, D. Ustiugov, E. Bugnion, and B. Grot, “Virtual Address Translation via Learned Page Tables Indexes,” Conference on Neural Information Processing Systems, Dec. 2018.

[95] D. Jimenez, and C. Lin, “Dynamic Branch Prediction with Perceptrons,” International Symposium on High-Performance Computer Architecture, Jan. 2001.

[96] B. Reagen, J. M. Hernandez-Lobato, R. Adolf, M. Gelbart, P. Wahtmoug, G.-Y. Wei, and D. Brooks, “A Case for Efficient Accelerator Design Space Exploration via Bayesian Optimization,” International Symposium on Low Power Electronics and Design, July 2017.

AI for Architecture:

Principles and Prospects

for the Next Paradigm

College of Engineering System Technology and Architecture Research (STAR) Lab

Drew Penney and Lizhong Chen

Case Study: ML-Enabled Routerless NoC Design

Motivation• Route at source using loops

How to configure these loops?• Evolutionary is unreliable• Heuristics are inflexible

Model• Why deep reinforcement learning?

- No training set- Effective/flexible exploration framework


Implementation• State/action/reward representation

- State: N*N NoC →N^2 * N^2 hop count matrix

- Action: 2 opposing points = rectangle- Reward: Loops =

good (unless constraint violated)low hop count = good


Results• 4x4 NoC = seconds• 10x10 NoC = minutes• Highly regular & high diversity• 3.2x higher throughput, 1.6x lower latency,

5x lower power compared to meshLin [91]


ai for architecture - oregon state university

Documents