memory state compressors for gigascale checkpoint/restore
DESCRIPTION
Memory State Compressors for Gigascale Checkpoint/Restore. www.eecg.toronto.edu/aenao. Andreas Moshovos [email protected]. Gigascale Checkpoint/Restore. Several Potential Uses: Debugging Runtime Checking Reliability Gigascale Speculation. Instruction Stream. checkpoint. - PowerPoint PPT PresentationTRANSCRIPT
Moshovos © 1
Memory State CompressorsMemory State Compressorsfor Gigascale for Gigascale
Checkpoint/RestoreCheckpoint/Restore
Andreas MoshovosAndreas [email protected]@eecg.toronto.edu
www.eecg.toronto.edu/aenaowww.eecg.toronto.edu/aenao
Moshovos © 2
Gigascale Checkpoint/Restore
Several Potential Uses: Debugging Runtime Checking Reliability Gigascale Speculation
Many instructions
checkpoint
Restore trigger
Instruction Stream
Moshovos © 3
Key Issues & This Study
Track and Restore Memory State I/O?
This Work: Memory State Compression Goals:
Minimize On-Chip Resources Minimize Performance Impact
Contributions: Used Value Prediction to simplify compression hardware Fast, Simple and Inexpensive Benefits whether used alone or not
Moshovos © 4
Outline
Gigascale Checkpoint/Restore
Compressor Architecture: Challenges
Value-Prediction-Based Compressors
Evaluation
Moshovos © 5
Our Approach to Gigascale CR (GCR)
Checkpoint: blocks that were written into
Current Memory State + Checkpoint = Previous Memory State
Checkpoints: Can be large (Mbytes) and we may want many
checkpoint
begins
Restore triggerCheckpoint memory block on first write
12
3
Restore all checkpointedmemory blocks 4
5
Moshovos © 6
Checkpoint Storage Requirements
1K 16K
256K 4M 64
M16
G
gcc
mesa
twolf
32K
1M
32M
1K
Checkpoint Interval in Instructions
Max.
Ch
eckp
oin
t S
ize in
Byte
s 1G
Moshovos © 7
Architecture of a GCR CompressorL1 D
ata
Cach
e
Compressor
Alig
nm
en
t N
etw
ork
Main
Mem
ory
in-buffer out-buffer
Size SizeResources & Performance
Previous work: Compressor = Dictionary-BasedRelatively Slow, Complex Alignment, order 10K of Transistors
64K In-Buffer ~3.7% Avg. Slowdown
Moshovos © 8
Our Compression Architecture
Standalone: ~Compression, - Resources
In Combination: -Resources (in-buffer), +Compression, +Performance
L1 D
ata
Cach
e
Dic
tion
ary
Com
pre
ssor
Alig
nm
en
t N
etw
ork
Main
Mem
ory
in-buffer out-buffer
VP C
om
pre
ssor
Sim
ple
Alig
nm
ent
VP stage Optional
Moshovos © 9
Value-Predictor-Based Compression
value ValuePredictor
value0
1
Input stream Output stream
predicted
mispredicted
Moshovos © 10
Example
0
22
22
VP
VP
VP
0
22
0
1
0
TIM
E
Moshovos © 11
Block VP-Based Compressor
Shown is Last-Outcome Predictor Studied Others (four combinations per word)
word 0
value
0 1
Input stream Output stream
mispredicted wordsword 1
word 15
address
VP
VP
VP
VP
1
value
Header (one word)
single entrypredictors
Cach
e b
lock
Half-word alignment
Moshovos © 12
Evaluation
Compression Rates Compared with LZW
Performance As a function of in-buffer size
Moshovos © 13
Methodology
Simplescalar v3
SPEC CPU 2000 with reference inputs
Ignore first checkpoint to avoid artificially skewing the results
Simulated up to: 80Billion instructions (compression rates) 5Billion instructions (performance)
8-way OOO Superscalar
64K L1D, L1I, 1M UL2
Moshovos © 14
Compression Rate vs. LZW
0%
25%
50%
75%
100%
gzip vp
rgc
cm
esa
mcf
equa
ke
amm
p
pars
erga
p
vorte
xbz
ip2tw
olf
AVG
LZW-16 bits LO LO+LZW
bett
er
256M Instructions Checkpoint Interval
Moshovos © 15
Performance Degradation
LZW + 64K buffer = ~3.7% slowdown LZW + LO + 1K buffer = 1.6% slowdown
0.88
0.92
0.96
1.00
gzip vpr gcc mesa mcf equake ammp parser gap vortex bzip2 tw olf AVG
LZW 1K LZW 64K LO+LZW 1K
bett
er
Moshovos © 16
Summary
Memory State Compression for Gigascale CR Many Potential Applications Used Simple Value-Prediction Compressors
Few Resources Low Complexity Fast Performance
Can be Used Alone Can be Combined with Dictionary-based
Compressors Reduced on-chip buffering Better Performance
Main memory compression?