15-745 spring 2006 wavescalar s. swanson, et al. computer science and engineering university of...

12
15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

Upload: douglas-hines

Post on 17-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006

WavescalarS. Swanson, et al.

Computer Science and Engineering University of Washington

Presented by Brett Meyer

Page 2: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006 Slide 2Wavescalar

ILP in Modern Architecture

• Lots of available ILP in software– Execute in parallel for greater performance

• Superscalar processors can’t tap it– Serialized by PC

• Superscalar doesn’t scale

Data-flow approaches can cheaply leverage existing parallelism

Page 3: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006 Slide 3Wavescalar

Wavescalar

• Introduction

• WaveCache and Wavescalar ISA

• Evaluation and Results

• Does WaveCache make sense?

• Compiler challenges

Page 4: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006 Slide 4Wavescalar

Wavescalar: Basics

• ALU-in-cache data-flow architecture– No centralized, broadcast-based resources

• Compile data-flow binaries

Page 5: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006 Slide 5Wavescalar

Wavescalar: Waves

• Instructions architecture• Programs broken into waves

– Block with single entry

• Use wave number to tag data– Disambiguates data from

multiple iterations

Page 6: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006 Slide 6Wavescalar

Wavescalar: Memory

• Relaxed program order– Follow control-flow– Obey dependencies

• Distributed store buffers

• Hardware coherence

Page 7: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006 Slide 7Wavescalar

Evaluation

• WaveCache– 4 MB of on-chip instructions + data, 2K ALUs

• WaveCache vs. superscalar– 16-wide OOO, 1K registers, 1K window

• WaveCache vs. TRIPS– 4 16-wide in-order cores, 2 MB on-chip cache

• Key assumption: perfect memory

Fair comparisons? Is it reasonable to assume perfect memory?

Page 8: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006 Slide 8Wavescalar

Results

• WaveCache out-performs superscalar

• Similar performance to TRIPS

Page 9: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006 Slide 9Wavescalar

Memory is the problem, not ILP

• Data-flow exposes greater ILP• Memory not fast enough for low-ILP CPUs

– Processor-memory performance gap

• What does perfect memory hide?– Does superscalar perform better?

• Did not model hardware coherence

WaveCache needs MORE bandwidth than a superscalar

Page 10: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006 Slide 10Wavescalar

Is WaveScalar Scalable?

• Sub-linear performance improvement– More clusters further away from memory

• SPEC, MediaBench fit easily in memory

• What happens to performance when the working set doesn’t fit in WaveCache?

Page 11: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006 Slide 11Wavescalar

Compiler Challenges

• Wave identification– Can waves be optimized for performance?

• Handling path explosion– 1 BR/5 inst 1050 loaded for 100 executed?

Page 12: 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

15-745 Spring 2006 Slide 12Wavescalar

Compiler Challenges

• Semi-static instruction placement– Fetch partial/complete waves– Loads/stores close to memory– Clustering neighboring instructions– Reduce coherence traffic