chuanjun zhang, uc riverside 1 using a victim buffer in an application- specific memory hierarchy...

12
Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering Dept. of Computer Science and Engineering University of California, Riverside **Also with the Center for Embedded Computer Systems at UC Irvine This work was supported by the National Science Foundation and the Semiconductor Research Corporation

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Chuanjun Zhang, UC Riverside 1

Using a Victim Buffer in an Application-Specific Memory

Hierarchy

Chuanjun Zhang*, Frank Vahid***Dept. of Electrical Engineering

Dept. of Computer Science and EngineeringUniversity of California, Riverside

**Also with the Center for Embedded Computer Systems at UC Irvine

This work was supported by the National Science Foundation and the Semiconductor Research Corporation

Page 2: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Frank Vahid, UC Riverside 2

Low Power/Energy Techniques are Essential

Hot enough to cook an egg.

Skadron et al., 30th ISCA

High performance processors are going to be too hot to work

Low energy dissipation is imperative for battery-driven embedded systems

Low power techniques are essential to both embedded systems and high performance processors

Page 3: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Frank Vahid, UC Riverside 3

Caches Consume Much Power

>50%

Caches consume 50% of total processor system power

ARM920T and M*CORE(Segars 01, Lee 99)

Caches accessed often Consume dynamic power

Associativity reduces misses Less power off-chip, but more

power per access Victim buffer helps (Jouppi 90)

Add to direct-mapped cache Keep recently evicted lines in

small buffer, check on miss Like higher-associativity, but

without extra power per access 10% energy savings, 4%

performance improvement (Albera 99)

ProcessorCache

Victim buffer

Memory

Page 4: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Frank Vahid, UC Riverside 4

Victim Buffer

With a victim buffer One cycle on a cache hit Two cycles on a victim

buffer hit Twenty two cycles on a

victim buffer miss Without a victim buffer

One cycle on a cache hit Twenty one cycles on a

victim buffer miss More accesses to off-chip

memory

OFFCHIP MEMORY

PROCESSOR

HIT

L1 cache

Victim buffer

One cycle

MISSHIT

Miss

22 cycles21 cycles

Two cycles

Page 5: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Frank Vahid, UC Riverside 5

Cache Architecture with a Configurable Victim Buffer

Is a victim buffer a useful configurable cache parameter?

Helps for some applications

For others, not useful VB misses, so extra

cycle wasteful? Thus, want ability to

shut off VB for given app.

Hardware overhead One bit register A switch

Four-line victim buffer shown

control signals to the next level memory

SRAM

tag data

SRAMSRAM

CAM

Fully-associative victim buffer

27-bit tag 16-byte cache line data

VB on/off

reg

data from next level memory

Vdd victim line

data to processor

to mux

from cache control circuit

L1 cache

cache control circuit

control signals

s01

Page 6: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Frank Vahid, UC Riverside 6

0%

50%

100%

pad

pcm crc

auto

2

bcn

t

bilv

bin

ary

blit

bre

v

g3fa

x

fir

pje

pg

ucb

qso

rt

v42

adpcm

epic

g721

peg

wit

mpeg

jpeg art

mcf

par

ser

vpr

Ave

Hit rate of victim buffer when added to an 8 Kbyte, 4 Kbyte, or 2 Kbyte direct-mapped cacheBenchmarks from Powerstone, MediaBench, and Spec 2000.

 

0%

50%

100%

padp

cm crc

auto

2

bcnt

bilv

bina

ry blit

brev

g3fa

x fir

pjep

g

ucbq

sort

v42

adpc

m

epic

g721

pegw

it

mpe

g

jpeg ar

t

mcf

pars

er vpr

Ave

8 Kbyte 4 Kbyte 2 Kbyte

Data cache

Hit Rate of a Victim Buffer

Instruction cache

Page 7: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Frank Vahid, UC Riverside 7

Computing Total Memory-Related Energy

Consider CPU stall energy and off-chip memory energy

Excludes CPU active energy Thus, represents all memory-related energyenergy_mem = energy_dynamic + energy_static

energy_dynamic = cache_hits * energy_hit + cache_misses * energy_miss

energy_miss = energy_offchip_access + energy_uP_stall + energy_cache_block_fill

energy_static = cycles * energy_static_per_cycle

energy_miss = k_miss_energy * energy_hit

energy_static_per_cycle = k_static * energy_total_per_cycle

(we varied the k’s to account for different system implementations)

Underlined – measured quantities SimpleScalar (cache_hits, cache_misses, cycles)Our layout or data sheets (others)

Page 8: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Frank Vahid, UC Riverside 8

Performance and Energy Benefits of Victim Buffer with a Direct-Mapped Cache

21% 24% 13% 38% 43% 60% 15%

-4%

0%

4%

8%

12%

padp

cm crc

auto

2

bcnt

bilv

bina

ry blit

brev

g3fa

x fir

pjeg

ucbq

sort

v42

adpc

m

epic

g721

pegw

it

mpe

g

jpeg ar

t

mcf

pars

er vpr

performance energy

An 8-line victim buffer with an 8 Kbyte direct-mapped cache (0%=DM w/o victim buffer)

Should shut-off VB

Substantial benefit

Configurable victim buffer is clearly useful to avoid performance penalty for certain applications

Page 9: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Frank Vahid, UC Riverside 9

Is a Configurable Victim Buffer Useful Even With a Configurable Cache

We showed that a configurable cache can reduce memory access power by half on average

(Zhang/Vahid/Najjar ISCA 03, ISVLSI 03)

Software-configurable cache Associativity – 1, 2 or 4

ways Size: 2, 4 or 8 Kbytes

Does that configurability subsume usefulness of configurable victim buffer?

Line

2 Kbway

0.0

0.2

0.4

0.6

0.8

1.0

1 2 4

Associativity

Nor

mal

ized

ene

rgy

epic

mpeg2

Page 10: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Frank Vahid, UC Riverside 10

Best Configurable Cache with VB Configurations

Optimal cache configuration when cache associativity, cache size, and victim buffer are all configurable.

I and D stands for instruction cache and data cache, respectively.

V stands for the victim buffer is on. nK stands for the cache size is n

Kbyte. The associativity is represented by

the last four characters Benchmark vpr, I2D1 stands for two-

way instruction cache and direct-mapped data cache.

Note that sometimes victim buffer should be on, sometimes off

Example Best Example Bestpadpcm I8KD4KI1D2 ucbqsort I4KDV4KI1D1

crc I2KDV4KI1D1 v42 I8KD8KI1D1

auto2 I4KD2KI1D1 adpcm I2KDV2KI1D1

bcnt I2KD2KI1D1 epic IV4KDV8KI1D1

bilv I4KD2KI1D1 jpeg I8KD2KI4D1

binary I4KD2KI1D1 mpeg2 I4KDV4KI1D1

blit I2KDV2KI1D1 g721 I8KDV2KI2D1

brev I4KD2KI1D1 art I4KDV2KI1D1

g3fax I4KDV2KI1D1 mcf I4KD4KI1D1

fir I4KD2KI1D1 parser I8KDV4KI4D1

pjepg I4KDV2KI1D1 vpr I8KD2KI2D1

pegw it I4KD4KI1D1

Page 11: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Frank Vahid, UC Riverside 11

Performance and Energy Benefits of Victim Buffer Added to a Configurable Cache

32% 43% 23%

-4%

0%

4%

8%

12%

padp

cm crc

auto

2

bcnt

bilv

bina

ry blit

brev

g3fa

x fir

pjeg

ucbq

sort

v42

adpc

m

epic

g721

pegw

it

mpe

g

jpeg ar

t

mcf

pars

er vpr

performance energy

An 8-line victim buffer with a configurable cache, whose associativity, size, and line size are configurable (0%=optimal config. without VB)

Still surprisingly effective

Page 12: Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering

Frank Vahid, UC Riverside 12

Conclusion

Configurable victim buffer useful with direct-mapped cache

As much as 60% energy and 4% performance improvements for some applications

Can shut off to avoid performance penalty on other apps. Configurable victim buffer also useful with configurable

cache As much as 43% energy and 8% performance improvement for

some applications Can shut off to avoid performance overhead on other applications

Configurable victim buffer should be included as a software-configurable parameter to direct-mapped as well as configurable caches for embedded system architectures