memory state compressors for gigascale checkpoint/restore

Moshovos © 1

Memory State CompressorsMemory State Compressorsfor Gigascale for Gigascale

Checkpoint/RestoreCheckpoint/Restore

Andreas MoshovosAndreas [email protected]@eecg.toronto.edu

www.eecg.toronto.edu/aenaowww.eecg.toronto.edu/aenao

Moshovos © 2

Gigascale Checkpoint/Restore

Several Potential Uses: Debugging Runtime Checking Reliability Gigascale Speculation

Many instructions

checkpoint

Restore trigger

Instruction Stream

Moshovos © 3

Key Issues & This Study

Track and Restore Memory State I/O?

This Work: Memory State Compression Goals:

Minimize On-Chip Resources Minimize Performance Impact

Contributions: Used Value Prediction to simplify compression hardware Fast, Simple and Inexpensive Benefits whether used alone or not

Moshovos © 4

Outline

Gigascale Checkpoint/Restore

Compressor Architecture: Challenges

Value-Prediction-Based Compressors

Evaluation

Moshovos © 5

Our Approach to Gigascale CR (GCR)

Checkpoint: blocks that were written into

Current Memory State + Checkpoint = Previous Memory State

Checkpoints: Can be large (Mbytes) and we may want many

checkpoint

begins

Restore triggerCheckpoint memory block on first write

12

3

Restore all checkpointedmemory blocks 4

5

Moshovos © 6

Checkpoint Storage Requirements

1K 16K

256K 4M 64

M16

G

gcc

mesa

twolf

32K

1M

32M

1K

Checkpoint Interval in Instructions

Max.

Ch

eckp

oin

t S

ize in

Byte

s 1G

Moshovos © 7

Architecture of a GCR CompressorL1 D

ata

Cach

e

Compressor

Alig

nm

en

t N

etw

ork

Main

Mem

ory

in-buffer out-buffer

Size SizeResources & Performance

Previous work: Compressor = Dictionary-BasedRelatively Slow, Complex Alignment, order 10K of Transistors

64K In-Buffer ~3.7% Avg. Slowdown

Moshovos © 8

Our Compression Architecture

Standalone: ~Compression, - Resources

In Combination: -Resources (in-buffer), +Compression, +Performance

L1 D

ata

Cach

e

Dic

tion

ary

Com

pre

ssor

Alig

nm

en

t N

etw

ork

Main

Mem

ory

in-buffer out-buffer

VP C

om

pre

ssor

Sim

ple

Alig

nm

ent

VP stage Optional

Moshovos © 9

Value-Predictor-Based Compression

value ValuePredictor

value0

1

Input stream Output stream

predicted

mispredicted

Moshovos © 10

Example

0

22

22

VP

VP

VP

0

22

0

1

0

TIM

E

Moshovos © 11

Block VP-Based Compressor

Shown is Last-Outcome Predictor Studied Others (four combinations per word)

word 0

value

0 1

Input stream Output stream

mispredicted wordsword 1

word 15

address

VP

VP

VP

VP

1

value

Header (one word)

single entrypredictors

Cach

e b

lock

Half-word alignment

Moshovos © 13

Methodology

Simplescalar v3

SPEC CPU 2000 with reference inputs

Ignore first checkpoint to avoid artificially skewing the results

Simulated up to: 80Billion instructions (compression rates) 5Billion instructions (performance)

8-way OOO Superscalar

64K L1D, L1I, 1M UL2

Moshovos © 14

Compression Rate vs. LZW

0%

25%

50%

75%

100%

gzip vp

rgc

cm

esa

mcf

equa

ke

amm

p

pars

erga

p

vorte

xbz

ip2tw

olf

AVG

LZW-16 bits LO LO+LZW

bett

er

256M Instructions Checkpoint Interval

Moshovos © 15

Performance Degradation

LZW + 64K buffer = ~3.7% slowdown LZW + LO + 1K buffer = 1.6% slowdown

0.88

0.92

0.96

1.00

gzip vpr gcc mesa mcf equake ammp parser gap vortex bzip2 tw olf AVG

LZW 1K LZW 64K LO+LZW 1K

bett

er

Moshovos © 16

Summary

Memory State Compression for Gigascale CR Many Potential Applications Used Simple Value-Prediction Compressors

Few Resources Low Complexity Fast Performance

Can be Used Alone Can be Combined with Dictionary-based

Compressors Reduced on-chip buffering Better Performance

Main memory compression?

memory state compressors for gigascale checkpoint/restore

Documents