a regulated transitive reduction (rtr) for longer memory race recording (aslpos’06)

A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording

(ASLPOS’06)

Min Xu Rastislav Bodik Mark D. Hill

Shimin Chen

LBA Reading Group Presentation

% gcc sim.c% a.outSegmentation fault%

% gdb a.outgdb> runProgram received SIGSEGV.In get() at hash.c:4545 a = bucket->d;

% gdb a.outgdb> runProgram exited normally.gdb>

% gcc para-sim.c% a.outSegmentation fault%

Why Do You Need a Recorder?

% gdb a.out loggdb> runProgram received SIGSEGV.In get() at para-hash.c:6767 a = bucket->d;

% gcc para-sim.c% a.outSegmentation faultRace recorded in “log”%

3Ideally …

% gdb a.out loggdb> runProgram received SIGSEGV.In get() at para-hash.c:6767 a = bucket->d;

% gcc para-sim.c% a.outSegmentation faultRace recorded in “log”%

Long recording:small logLow runtime

overheadLow cost

Applicability:Programs – data race

Systems – non-SC

4Flight Data Recorder (ISCA’03)

Full-system Record-Replay• Recording memory races:

• Assumes Sequential Consistency (SC)• Record order of instruction interleaving• Target cache-coherence multiprocessor server• Piggyback on coherence protocol: little extra H/W

• Recording system states: SafetyNet• Recording I/OsResults:

• Non-trivial recording interval: 1 second• Negligible runtime overhead: less than 2%• Can be “Always On”

Better memory race log compression• 1 byte per Kilo instructions

Dealing with Total Store Ordering

In this talk, I will try to describe a full picture combining FDR and RTR.

6Outline

•Introduction•Recording System State•Recording Input/Output•Recording Memory Races•Dealing with TSO•Summary

Recording System State (based on SafetyNet)

•Purpose: re-construct the initial state (registers, TLB, main memory) at the beginning of the replay interval

•Policy: FDR’s 1second replay interval• Take a logical checkpoint every 1/3 second• Reserve memory space to store logs for 4

checkpoints•Logical checkpoint:

• Quiesce entire system to take a physical checkpoint• Registers and TLB states (4248 bytes/processor on

SPARC V9)• Log old value of a cache line upon first update

• Add an “already-updated” bit per cache line

FDR paper

9Outline

10Recording I/O

I/O loads

Instruction count + interrupt number

DMA store values

11Outline

12Log All Dependence

Thread I Thread J

Replay

Log J: 23 14 35 46

Log I: 23

Log Size: 5*16=80 bytes(10 integers)

Dependence Log

16 bytes

But too many dependence

Netzer’s Transitive Reduction (TR)approximated by FDR

Thread I Thread J

Replay

TR reduced Log J: 23

Log I: 23

Log Size: 64 bytes(8 integers)

TR Reduced Log

How to further reduce log size?

Actively creating artificial dependencies• Stricter• Vectorized

15The Intuition of the RTR Algorithm

After Reduction

From I to J

From J to I

Vectors

Vectors“Regulate” Replay

Stricter Dependences to Aid Vectorization

Thread I Thread J

Replay

st Ald D

5 5sub st C

6 6ld B st D

Log J: 23 45

Log I: 23

New Reduced Log

stricter

Reduced

Fewer dependencies to log

17Compress Vectorized Dependencies

Thread I Thread J

Replay

Log J: x=3,5, ∆=1

Log I: x=3, ∆=1

Vectorized Log

VectorDeps.

TRRTR: fewer deps + fewer byte/dep

19H/W Considerations

(IC) Instruction count per core -- easy(VIC[p]) record previously seen senders’ largest time stamps for transitive reduction

(CTS[b]) time stamp per cache block:• i.e. record IC upon load/store commits• At commit time:

• Figure out memory address – how difficult?• Write CTS: decoupled timestamp memory

20H/W Considerations Cont’d

Piggyback on cache coherence messages• FDR: CTS[b]• RTR: CTS[b] & sender’s IC

Logic to perform algorithm at the receiver side• FDR: integer comparison, update VIC[sender],

generate log record• RTR: in addition, max/min, integer subtraction

Augment directory structure• Record last owner for evicted blocks

Cache must respond to inquiries about evicted blocks: reply with CTS[SET/LRU]

21Outline

22Total Store Ordering

FIFO Write buffer• A store commits by placing its value into write

buffer• A store is ordered when it exits the write buffer

and updates the memory• Stores are ordered in commit order (FIFO)

Load can obtain values from write buffer or from memory system

23Problems with TSO

/* XXX */ is memory order

The two examples create cycles that will result in replay deadlocks

24Solution

Identify problematic load instructions• Monitor invalidation in [t1, t2]• t1: the load (or the previous store that feeds the

load) is ordered at memory• t2: all preceding instructions are ordered

Log load values and replay these load instructions by values

HW: similar to the misspeculation detection circuitry in SC systems (e.g. MIPS R10000)

Insufficient for supporting Processor Consistency and other more relaxed models

25Conclusion

RTR 1 byte/kilo-instruction•Based on Netzer’s transitive reduction•Create stricter dependencies•Vectorize dependencies to compress log•Avoid overly-strict hence no deadlock

a regulated transitive reduction (rtr) for longer memory race recording (aslpos’06)

Documents

transitive re-identification

t vát rtr

rtr bearing

sinhala transitive verbs

transitive credit

transitive in our preferences, but transitive in different...

rtr interoperability strategy principal software engineer,...

transitive actions

transitive report blank tables

gordon_k_1984_the transitive vampire.pdf

word study (transitive)

fw: rtr ~entered

rpki->rtr protocol

rtr-frto forms

rtr law firm

data collection - tandd.com · portable data collector -...

installation and assembly manual: prs, rtr series ... ·...

dergasser - rtr-pca.org

rtr- section 8

cjsser - rtr-pca.org