rtr: 1 byte/kilo- i nstruction race recording

RTR: 1 Byte/Kilo-InstructionRace Recording

Min Xu Rastislav Bodik Mark D. Hill

% gcc sim.c% a.outSegmentation fault%

% gdb a.outgdb> runProgram received SIGSEGV.In get() at hash.c:4545 a = bucket->d;

% gdb a.outgdb> runProgram exited normally.gdb>

% gcc para-sim.c% a.outSegmentation fault%

Why Do You Need a Recorder?

% gdb a.out loggdb> runProgram received SIGSEGV.In get() at para-hash.c:6767 a = bucket->d;

% gcc para-sim.c% a.outSegmentation faultRace recorded in “log”%

3Ideally …

% gdb a.out loggdb> runProgram received SIGSEGV.In get() at para-hash.c:6767 a = bucket->d;

% gcc para-sim.c% a.outSegmentation faultRace recorded in “log”%

Long recording:small logLow runtime

overheadLow cost

Applicability:Programs – data race

Systems – non-SC

4Better and Better Recorders

Low Cost

Low Overhead

Small Log Size

Applicability

Data Race Non-SC

InstRply ’87 Y Y

B. & G. ’91 YNetzer’93 YDéjà Vu

’98 YRecPlay

’00JaRec ’04

FDR ’03BugNet ’05

YReEnact

‘03 Y

CORD ‘06 YStrata ’06 YRTR ‘06 Y Y

1 Byte/Kilo-Instruction[ASPLOS’06]

A New Recorder

HardwareAcceleration

[ISCA’03]

LessHardware[ASPLOS’06]

SC & TSO[ASPLOS’06]

Result: One more step toward practical

This talk covers only RTR• Regulated Transitive Reduction algorithm

6Outline

Race Recording

RTR Algorithm

Results with Commercial Workloads

Compress log during recording replay more “regularly”

Conclusion

Technically, what’s race recording?

8Race Recording

print(X)

-X = X*5

X = X*5-

Thread IThread J

Original Replay

Recording

-X = X*5

Thread IThread J

Goal: Reproduce same conflicts with minimum log data

Terminologies and Assumptions

Thread I Thread J

Recording

Thread I Thread J

Replay

Conflicts(red)

Dependence(black)

Regulated Transitive Reduction (RTR)

11Log All Conflicts

Thread I Thread J

Replay

Log J: 23 14 35 46

Log I: 23

Log Size: 5*16=80 bytes(10 integers)

Dependence Log

16 bytes

But too many conflicts

12Netzer’s Transitive Reduction (TR)

Thread I Thread J

Replay

TR reduced Log J: 23

Log I: 23

Log Size: 64 bytes(8 integers)

TR Reduced Log

How to further reduce log size?

13The Intuition of the RTR Algorithm

After Reduction

From I to J

From J to I

Vectors

Vectors“Regulate” Replay

Stricter Dependences to Aid Vectorization

Thread I Thread J

Replay

st Ald D

5 5sub st C

6 6ld B st D

Log J: 23 45

Log I: 23

New Reduced Log

stricter

Reduced

Fewer dependencies to log

15Compress Vectorized Dependencies

Thread I Thread J

Replay

Log J: x=3,5, ∆=1

Log I: x=3, ∆=1

Vectorized Log

VectorDeps.

TRRTR: fewer deps + fewer byte/dep

16Deadlock Avoidance of RTR

Thread I Thread J

Recording

Limit the strict dependencies (see paper)

i:4j:1 j:2 i:3 i:4

Replay Cycle

Results with Commercial Workloads

18Full-system Simulation Method

Commercial server hardware• GEMS: http://www.cs.wisc.edu/gems• Full-system (OS + application) executions• 4-core CMP (Sequential Consistent)

• 1-way in-order issue, 2 GHz, • 64KB I/D L1, 4MB L2, 64byte lines, MOSI directory

Commercial server software• Apache – static web serving• SpecJBB – middleware• OLTP – TPC-C like• Zeus – static web serving

19Log Size: 1 byte/KI

2.0byte/core/KI

ApacheJBB OLTP Zeus AVG0

200KB/core/s

ApacheJBB OLTP Zeus AVG

Less buffer, longer recording, smaller logs

20RTR vs. Netzer’s TR

ApacheJBB OLTP ZeusAVG

e 28% smaller log• TR was “optimal”

21Why Does RTR Work Well?

RTR•Instructions execute at similar speed•Dependencies are often “vectorizable”

A New Recorder

HardwareAcceleration

[ISCA’03]

1 Byte/Kilo-Instruction[ASPLOS’06]

LessHardware[ASPLOS’06]

SC & TSO[ASPLOS’06]

Result: One more step toward practical

“Less hardware” & “TSO” not covered• Equally important• More details in the paper

23Conclusion

Race recording Counter nondeterminism

RTR 1 byte/kilo-instruction•Based on Netzer’s transitive reduction•Create stricter dependencies•Vectorize dependencies to compress log•Avoid overly-strict hence no deadlock

Future work•Support snooping, SMT, replayer

rtr: 1 byte/kilo- i nstruction race recording

gcc parasim

thread ithread jterminologies

gcc sim

long recording

recording replay

outsegmentation faultrace

loggdb runprogram

outgdb runprogram

Documents

nstruction manual uide dutilisation manual de

rtr-57u user's manual us - thermoworks · wireless data...

nstruction - use and care manuals

data collection - tandd.com · portable data collector -...

rtr reference notes

a plan to nstruction in all

page 1 | livret d'instructions |nstruction book manual

kilo 1000 units

rtr interoperability strategy principal software engineer,...

voluntarios 1 kilo

septodont rtr

intelligent variable speed & variable flow pump ......

kilo class submarine

tanesco distribution engineering instruction manual · kv...

ic7410_i nstruction manual

manual de nstrucciones nstruction manual

owner’s nstruction and operation manual - pdf.lowes.com

1 lending ssessment with nstruction rogram

foundation find the missing...

rtr-frto forms