memory consistency arbob ahmad, henry deyoung, rakesh iyer 15-740/18-740: recent research in...
TRANSCRIPT
Memory Consistency
Arbob Ahmad, Henry DeYoung, Rakesh Iyer
15-740/18-740: Recent Research in ArchitectureOctober 14, 2009
“Memory Model =Instruction Reordering + Store Atomicity”
Arvind and Jan-Willem Maessen
●“Memory consistency models exist to describeand constrain the behavior of [memory systems]”
●Gives a unifying framework forSC and relaxed models with an atomic memory
Instruction Reorderingvs. Store Atomicity
● Instruction reordering rules:● Consistency within a thread● e.g.:
● Store atomicity rules:● Ordering which must exist in every serialization● Consistency across threads
Store Atomicity
1.Predecessor Stores of a Load are ordered before its source.
2.Successor Stores of a Store are ordered after its observers.
x ← 2
x → 2x ← 1
Store Atomicity
1.Predecessor Stores of a Load are ordered before its source.
2.Successor Stores of a Store are ordered after its observers.
3.Mutual ancestors of Loads are ordered before the mutual successors of the distinct Stores they observe.
?
Thread A Thread B Thread C
x ← 1Fence y → 2y → 4
y ← 2Fence z ← 6
y ← 4Fence z → 6Fencex ← 8x → ?
Local ordering constraints
Thread A Thread B Thread C
x ← 1Fence y → 2y → 4
y ← 2Fence z ← 6
y ← 4Fence z → 6Fencex ← 8x → ?
Observation constraints
Thread A Thread B Thread C
x ← 1Fence y → 2y → 4
y ← 2Fence z ← 6
y ← 4Fence z → 6Fencex ← 8x → ?
Question:Are there any ordering constraints not represented?
Thread A Thread B Thread C
x ← 1Fence y → 2y → 4
y ← 2Fence z ← 6
y ← 4Fence z → 6Fencex ← 8x → ?
Question:Are there any ordering constraints not represented?
y ← 2 :y → 2 :y ← 4 :y → 4
y ← 4 :y → 4 :y ← 2 :y → 2
Order is
or
Thread A Thread B Thread C
x ← 1Fence y → 2y → 4
y ← 2Fence z ← 6
y ← 4Fence z → 6Fencex ← 8x → ?
y ← 2 :y → 2 :y ← 4 :y → 4
y ← 4 :y → 4 :y ← 2 :y → 2
Order is
or
● x ← 1 must precede both y → 2 and y → 4● ● z → 6 must follow both● y → 2 and y → 4
Thread A Thread B Thread C
x ← 1Fence y → 2y → 4
y ← 2Fence z ← 6
y ← 4Fence z → 6Fencex ← 8x → ?
Store atomicity constraint
Sequential Consistency
●Programmer's gold standard
●Question: How can we have the clarity of SC without sacrificing performance?
Improving the Performance of SC
Key Idea: Rather than turning the switch at individual memory access boundaries, do it only at chunk boundaries.
This is the topic of:
“BulkSC: Bulk Enforcement of Sequential Consistency”
Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas
“Mechanisms for Store-wait-free Multiprocessors”
Thomas Wenisch, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos
Coarse Grain Enforcement of SC
●Similar to tasks in TLS and transactions in TM●But, chunks are created dynamically by hardware;tasks and transactions are specified statically in code
Common Ground
•Dynamically divide the program into ‘chunks’ or ‘atomic sequences’
•ASO begins an atomic sequence when an ordering constraint would stall instruction retirement.
•BulkSC assumes chunks are around 1000 instructions.
•Re-ordering allowed within chunks/atomic sequences.
•Updates not visible until the commit.
•Evaluated on a full system simulator (Simics/Flexus)
Bulk SC: Bulk Enforcement of Sequential Consistency
Chunk executes, updates L1
Commit Made,R,W Signatures broadcast Bulk Disambiguator computes intersection
- Restart computation if non empty
Computes minimumserialization requirement.
Enables BulkSC on machineswithout broadcast capabilites
Atomic Store Ordering
•Scalable Store Buffer
•Eliminates store buffer capacity related stalls.
•No associative lookup required.
•ASO Implementation
•Eliminates ordering related stalls.
•Atomic sequence tracking.
•Detecting atomicity violations.
•Rollback on violation.
•Commit atomic sequences.
Open Research Questions in Memory Consistency
●Memory model framework was descriptive. What are the prescriptive consequences?●Can the “big-step” semantics of transactions be explained with “small-step” framework?●Can the same hardware in a single system be used for all of coarse-grain SC, TLS, and TM?●...
x ← 1Fencey ← 2y → 3
y ← 3Fencex ← 4x → ?
Thread A Thread B
Question:We need one more edge to capture the ordering. Where should it go?