cache parameters

22
111/06/17 \course\cpeg324-08F\Topic7c 1 Cache Parameters Cache size :S cache (lines) Set number: N (sets) Line number per set: K (lines/set) S cache = KN (lines) = KN * L (bytes) Here L is line size in bytes K-way set-associative

Upload: lihua

Post on 06-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Cache Parameters. Cache size :S cache (lines) Set number:N (sets) Line number per set:K (lines/set) S cache = KN (lines) = KN * L (bytes)  Here L is line size in bytes K-way set-associative. Trade-offs in Set-Associativity. Fully-associative: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 1

Cache Parameters

• Cache size : Scache (lines)

• Set number: N (sets)

• Line number per set: K (lines/set)

Scache = KN (lines)

= KN * L (bytes) Here L is line size in bytes

K-way set-associative

Page 2: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 2

Trade-offs in Set-Associativity

Fully-associative:

- Higher hit ratio, concurrent search, but slow access when

associativity is large.

Direct mapping:

- Fast access (if hits) and simplicity for comparison.

- Trivial replacement algorithm.

Problem with hit ratio, e.g. in extreme case: if alternatively use 2

blocks which mapped into the same cache block frame: “trash”

may happen.

Page 3: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 3

Note

Main memory size: Smain (blocks)

Cache memory Size: Scache (blocks)

Let P = Since P >>1.

Average search length is much greater than 1.

• Set-associativity provides a trade-off between:

- Concurrency in search.

- Average search/access time per block.

Smain

Scache

You need search!

Page 4: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 4

1 N Scache< <

Fullassociative

Setassociative

DirectMapped

Number of sets

Page 5: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 5

Important Factors in Cache Design

• Address partitioning strategy

(3-dimention freedom).

• Total cache size/memory size

• Work load

Page 6: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 6

Address Partitioning

• Byte addressing mode

Cache memory size data part = NKL (bytes)

• Directory size (per entry)

M - log2N - log2L

• Reduce clustering (randomize accesses)

M bits

Log N Log L

Set number byte address in a line

set size

Page 7: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 7

Note: The exists a knee1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.18 10 20 30 40 Cache Size

0.34

General Curve Describing Cache Behavior

Mis

s R

atio

Page 8: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 8

…the data are sketchy and highly dependent on the

method of gathering...

… designer must make critical choices using a

combination of “hunches, skills, and experience” as

supplement…

“a strong intuitive feeling concerning a future event or

result.”

Page 9: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 9

Basic Principle

• Typical workload study + intelligent estimate of others

• Good Engineering: small degree over-design

• “30% rule”:

- Each doubling of the cache size reduces misses

by 30% by Alan J. Smith. Cache Memories. Computing

Surveys, Vol. 14., No 13, Sep 1982.

- It is a rough estimate only.

Page 10: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 10

K: Associativity

• Bigger Miss ratio

• Smaller is better in:

- Faster

- Cheaper

• 4 ~ 8 get best miss ratio

Simpler

Page 11: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 11

L : Line Size

• Atomic unit of transmission

• Miss ratio

• Smaller- Larger average delay

- Less traffic

- Larger average hardware cost for associative search

- Larger possibility of “Line crossers”

• Workload dependent

• 16 ~ 128 byteMemory references spanning the boundary between two cache lines

Page 12: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 12

Cache Replacement Policy

• FIFO (first-in-first-out) replace the block loaded furthest in

the past

• LRU (least-recently used) replace the block used

furthest in the past

• OPT (furthest-future used) replace the block which will

be used furthest in the future.

Do not retain lines that have next occurrence in the most

distant future

Note: LRU performance is close to OPT for frequently

encountered program structures.

Page 13: Cache Parameters

Example: Misses and AssociativitySmall cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8.

A. Direct Mapped Cache.

Blue text Data used in time t.

Black text Data used in time t-1.112/04/20 \course\cpeg324-08F\Topic7c 13

5 misses for the 5 accesses

Page 14: Cache Parameters

Example: Misses and Associativity (cont’d)Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8.

B. Two-way set-associative. LRU replacement policy

Blue text Data used in time t.

Black text Data used in time t-1.112/04/20 \course\cpeg324-08F\Topic7c 14

4 misses for the 5 accesses

Page 15: Cache Parameters

Example: Misses and Associativity (cont’d)Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8.

C. Fully associative Cache.

- Any memory block can be stored in any cache block.

Blue text Data used in time t.

Black text Data used in time t-1.

Red text Data used in time t-2.

112/04/20 \course\cpeg324-08F\Topic7c 15

3 misses for the 5 accesses

Page 16: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 16

Program Structure

for i = 1 to n

for j = 1 to n

endfor

endfor

Last-in-first-out feature makes the recent past likes the near future

….

Page 17: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 19

Problem with LRU• Not good in mimic sequential/cyclic

Example

ABCDEF ABC…… ABC……

Exercise: With a set size of 3, what is the miss ratio

assuming all 6 addresses mapped to the same set ?

Page 18: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 23

Performance Evaluation Methods for Workload

• Analytical modeling.

• Simulation

• Measuring

Page 19: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 24

Cache Analysis Methods

• Hardware monitoring:

- Fast and accurate.

- Not fast enough (for high-performance

machines).

- Cost.

- Flexibility/repeatability.

Page 20: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 25

Cache Analysis Methods

• Address traces and machine simulator:

- Slow.

- Accuracy/fidelity.

- Cost advantage.

- Flexibility/repeatability.

- OS/other impacts - How to put them in?

cont’d

Page 21: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 26

Trace Driven Simulation for Cache

• Workload dependence:

- Difficulty in characterizing the load.

- No general accepted model.

• Effectiveness:

- Possible simulation for many parameters.

- Repeatability.

Page 22: Cache Parameters

112/04/20 \course\cpeg324-08F\Topic7c 27

Problem in Address Traces

• Representative of the actual workload (hard)

- Only cover a small fraction of real workload.

- Diversity of user programs.

• Initialization transient

- Use long enough traces to absorb the impact

of cold misses

• Inability to properly model multiprocessor effects