cs 7810 lecture 19 coherence decoupling: making use of incoherence j.huh, j. chang, d. burger, g....

17
CS 7810 Lecture 19 oherence Decoupling: Making Use of Incoheren J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

CS 7810 Lecture 19

Coherence Decoupling: Making Use of Incoherence

J.Huh, J. Chang, D. Burger, G. SohiProceedings of ASPLOS-XI

October 2004

Coherence / Consistency

• Coherence guarantees (i) that a write will eventually be seen by other processors, and (ii) write serialization (all processors see writes to the same location in the same order)

• The consistency model defines the ordering of writes and reads to different memory locations – the hardware guarantees a certain consistency model and the programmer attempts to write correct programs with those assumptions

Consistency Examples

Initially, A = B = 0

P1 P2A = 1 B = 1if (B == 0) if (A == 0) critical section critical section

Initially, A = B = 0

P1 P2 P3A = 1 if (A == 1) B = 1 if (B == 1) register = A

P1 P2Data = 2000 while (Head == 0)Head = 1 { } … = Data

Snooping-Based Cache Coherence

• Caches share a bus; every cache sees each transaction in the same cycle; every cache manages itself• When one cache writes to a block, every other cache invalidates its copy of that block• When a cache has a read miss, the block is provided by memory or the last writer• Protocols are defined by states: MSI, MESI, MOESI

Processor

Caches

ProcessorProcessorProcessor

CachesCachesCaches

Memory

Directory-Based Cache Coherence

• A directory keeps track of the sharing status of each block• Every request goes to the directory and the directory then sends directives to each cache – the directory is the point of serialization (just as the bus is, in a snooping protocol)• For example, on a write, the request reaches the directory, the directory sends invalidates to other sharers, and permissions are granted to the writer

Processor

Caches

ProcessorProcessorProcessor

CachesCachesCaches

Memory

Network

Directory

TLDS

• A certain ordering of reads and writes is assumed – if that ordering is violated, the thread is re-executed• The coherence protocol is used to propagate writes

Thread 1

Caches

Thread 4Thread 3Thread 2

CachesCachesCaches

Memory

The Traditional Model

• No thread is speculative – a parallel application with synchronization points and parallel regions and guaranteed to execute correctly with no need for re-execution• Threads wait at synchronization points and wait for the correct permissions for every block of data

Thread 1

Caches

Thread 4Thread 3Thread 2

CachesCachesCaches

Memory

Coherence Decoupling

• A simple coherence protocol is often a slow protocol – for example, a simple protocol may not allow multiple outstanding requests

• Coherence decoupling: maintain a fast and incorrect protocol; and a slow and correct backing protocol; incurs fewer stalls in the common case and occasional recoveries

Coherence Decoupling

• A coherence operation is broken into two components: (i) acquiring and using the value, (ii) receiving the correct set of permissions

SCL Protocol

• Why does speculative cache look-up work?

False sharing: a line was invalidated, but a different word was written to

Silent stores or value locality

If there is spare bandwidth, updated values can be pushed out to sharers

Implementation

• The Miss Status Holding Register (MSHR) keeps track of outstanding requests – it can buffer the speculative value and ensure it matches the correct value – on a mis-speculation, that instruction is treated like a branch mis-predict

• Speculation on a coherence operation is no different from traditional forms of speculation

Coherence Decoupling Components

Microbenchmark Behavior

Results

Results

Summary

• Arguments for coherence decoupling: Reduces protocol complexity Reduces programming complexity Marginal hardware overhead Coherence misses will emerge as greater bottlenecks?

• What is the expected trend for CMPs?

Title

• Bullet