15-745 spring 2006 wavescalar s. swanson, et al. computer science and engineering university of...
TRANSCRIPT
![Page 1: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/1.jpg)
15-745 Spring 2006
WavescalarS. Swanson, et al.
Computer Science and Engineering University of Washington
Presented by Brett Meyer
![Page 2: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/2.jpg)
15-745 Spring 2006 Slide 2Wavescalar
ILP in Modern Architecture
• Lots of available ILP in software– Execute in parallel for greater performance
• Superscalar processors can’t tap it– Serialized by PC
• Superscalar doesn’t scale
Data-flow approaches can cheaply leverage existing parallelism
![Page 3: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/3.jpg)
15-745 Spring 2006 Slide 3Wavescalar
Wavescalar
• Introduction
• WaveCache and Wavescalar ISA
• Evaluation and Results
• Does WaveCache make sense?
• Compiler challenges
![Page 4: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/4.jpg)
15-745 Spring 2006 Slide 4Wavescalar
Wavescalar: Basics
• ALU-in-cache data-flow architecture– No centralized, broadcast-based resources
• Compile data-flow binaries
![Page 5: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/5.jpg)
15-745 Spring 2006 Slide 5Wavescalar
Wavescalar: Waves
• Instructions architecture• Programs broken into waves
– Block with single entry
• Use wave number to tag data– Disambiguates data from
multiple iterations
![Page 6: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/6.jpg)
15-745 Spring 2006 Slide 6Wavescalar
Wavescalar: Memory
• Relaxed program order– Follow control-flow– Obey dependencies
• Distributed store buffers
• Hardware coherence
![Page 7: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/7.jpg)
15-745 Spring 2006 Slide 7Wavescalar
Evaluation
• WaveCache– 4 MB of on-chip instructions + data, 2K ALUs
• WaveCache vs. superscalar– 16-wide OOO, 1K registers, 1K window
• WaveCache vs. TRIPS– 4 16-wide in-order cores, 2 MB on-chip cache
• Key assumption: perfect memory
Fair comparisons? Is it reasonable to assume perfect memory?
![Page 8: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/8.jpg)
15-745 Spring 2006 Slide 8Wavescalar
Results
• WaveCache out-performs superscalar
• Similar performance to TRIPS
![Page 9: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/9.jpg)
15-745 Spring 2006 Slide 9Wavescalar
Memory is the problem, not ILP
• Data-flow exposes greater ILP• Memory not fast enough for low-ILP CPUs
– Processor-memory performance gap
• What does perfect memory hide?– Does superscalar perform better?
• Did not model hardware coherence
WaveCache needs MORE bandwidth than a superscalar
![Page 10: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/10.jpg)
15-745 Spring 2006 Slide 10Wavescalar
Is WaveScalar Scalable?
• Sub-linear performance improvement– More clusters further away from memory
• SPEC, MediaBench fit easily in memory
• What happens to performance when the working set doesn’t fit in WaveCache?
![Page 11: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/11.jpg)
15-745 Spring 2006 Slide 11Wavescalar
Compiler Challenges
• Wave identification– Can waves be optimized for performance?
• Handling path explosion– 1 BR/5 inst 1050 loaded for 100 executed?
![Page 12: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf8b1a28abf838c8b327/html5/thumbnails/12.jpg)
15-745 Spring 2006 Slide 12Wavescalar
Compiler Challenges
• Semi-static instruction placement– Fetch partial/complete waves– Loads/stores close to memory– Clustering neighboring instructions– Reduce coherence traffic