soc 4.1 chapter 4 memory design: soc and board-based systems computer system design system-on-chip...

48
soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

Upload: amos-wilkins

Post on 21-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.1

Chapter 4 Memory Design: SOC and

Board-Based SystemsComputer System Design

System-on-Chipby M. Flynn & W. Luk

Pub. Wiley 2011 (copyright 2011)

Page 2: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.2

Cache and Memory

• cache– performance– cache partitioning– multi-level cache

• memory

– off-die memory designs

Page 3: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.3

Outline for memory design

Page 4: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.4

Area comparison of memory tech.

Page 5: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.5

System environments and memory

Page 6: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.6

Performance factors

Virtualaddress

1. physical word size• processor cache

2. block / line size• cache memory

3. cache hit time• cache size, organization

4. cache miss time• memory and bus

5. virtual-to-real translation time6. number of processor requests per cycle

Factors:

Page 7: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.7

Design target miss rates

beyond 1MBdouble the sizehalf the miss rate

Page 8: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.8

System effects limit hit rate

• operating System affects the miss ratio– about 20% increase

• so does multiprogramming (M)– miss rates may not be affected by increased cache size– Q = no. instructions between task switches

Page 9: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.9

System Effects• Cold-Start

– short transactions are created frequently and run quickly to completion

• Warm-Start– long processes are

executed in time slices

COLD

Page 10: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.10

Some common cache types

Page 11: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.11

Multi-level caches: mostly on die

• useful for matching processor to memory– generally at least 2-level

• For microprocessors L1 at frequency of pipelineand L2 at slower latency

– often use 3-level• Size limited by access time and improved cycle

times

Page 12: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.12

Cache partitioning: scaling effect on cache access

time• access time to a cache is approximately access time (ns) = (0.35 + 3.8f +(0.006 +0.025 f)

C) x (1 + 0.3(1 - 1/A)) where– f is the feature size in microns– C is the cache capacity in K bytes– A is the associativity, e.g. direct map A = 1

• for example, at f = 0.1u, A = 1 and C = 32 (KB) the access time is 1.00 ns

• problem with small feature size: cache access time, not cache size

Page 13: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.13

Minimum cache access time 1 array, larger sizes use multiple arrays (interleaving)

L1 usually less than 64kB

L3: multiple 256KB arrays

L2 usually less than 512KB (interleaved from smaller arrays)

Page 14: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.14

Analysis: multi-level cache miss rate• L2 cache analysis by statistical inclusion• if L2 cache > 4 x size of the L1 cache then

– assume statistically: contents of L1 lies in L2

• relevant L2 miss rates– local miss rate: No. L2 misses / No. L2 references– global Miss Rate: No. misses / No. processor ref. – solo Miss Rate: No. misses without L1/No. proc. ref.– Inclusion => solo miss rate = global miss rate

• miss penalty calculation– L1 miss rate x (miss in L1, hit in L2 penalty) plus– L2 miss rate x ( miss in L1, miss in L2 penalty - L1 to L2

penalty)

Page 15: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.15

Multi-level cache example

L1 L2 Memory

Miss Rate 4% 1%

- delays: Miss in L1, Hit in L2 2 cycles Miss in L1, Miss in L2 15 cycles

- assume one reference/instructionL1 delay is 1 ref/instr x .04 misses/ref x 2 cycles/miss = 0.08 cpiL2 delay is 1 ref/instr x .01 misses/ref x (15-2) = 0.13 cpiTotal effect of 2 level system is 0.08 + 0.13 = 0.29 cpi

Page 16: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.16

Memory design

• logical inclusion

• embedded RAM

• off-die: DRAM

• basic memory model

• Strecker’s model

Page 17: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.17

Physical memory system

Page 18: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.18

Hierarchy of caches

Name ? Size Access Transfer size

L0 Registers <256 words

<1 cycle word

L1 Core local <64K <4 cycle Line

L2 On Chip <64M <30 cycle Line

L3 DRAM on Chip

<1G <60 cycle >= Line

M0 Off Chip Cache

M1 Local Main Memory

<16G <150 cycle

>= Line

M2 Cluster Memory

Page 19: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.19

Hierarchy of caches

• Working Set – how much memory an “iteration” requires

• if it fits in a level then that will be the worst case• if it does not, hit rate typically determines

performance• double the cache level size half the miss rate –

good rule of thumb • if 90% hit rate, 10x memory access time,

performance 50% • and that’s for 1 core

Page 20: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.20

Logical inclusion

• multiprocessors with L1 and L2 caches – Important: L1 cache does NOT contain a line

• sufficient to determine– L2 cache does not have the line

• need to ensure– all the contents of L1 are always in L2

• this property: Logical Inclusion

Page 21: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.21

Logical inclusion techniques

• passive – control Cache size, organization, policies– no. L2 sets no. L1 sets– L2 set size L1 set size– compatible replacement algorithms– but: highly restrictive and difficult to guarantee

• active– whenever a line is replaced or invalidated in the L2– ensure it is not present in L1 or it is evicted from L1

Page 22: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.22

Memory system design outline• memory chip technology

– on-die or off die

• static versus dynamic:– SRAM versus DRAM

• access protocol: talking to memory– synchronous vs asynchronous DRAMs

• simple memory performance model– Strecker’s model for memory banks

Page 23: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.23

Why BIG memory?

Page 24: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.24

Memory

• many times, computation limited by memory– not processor organization or cycle time

• memory: characterized by 3 parameters– size– access time: latency– cycle time: bandwidth

Page 25: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.25

Embedded RAM

Page 26: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.26

Embedded RAM density (1)

Page 27: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.27

Embedded RAM density (2)

Page 28: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.28

Embedded RAM cycle time

Page 29: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.29

Embedded RAM error rates

Page 30: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.30

Off-die Memory Module

• module contains the DRAM chips that make up the physical memory word

• if the DRAM is organized 2n words x b bits and the memory has p bits/ physical word then the module has p/b DRAM chips.

• total memory size is then 2n words x p bits• Parity or Error-Correction Code (ECC) generally

required for error detection and availability

Page 31: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.31

Simple asychronous DRAM array• DRAM cell

– Capacitor: store charge for 0/1 state

– Transistor: switch capacitor to bit line

– Charge decays => refresh required

• DRAM array– Stores 2n bits in a square

array– 2n/2 row lines connect to

data lines– 2n/2 column bit lines

connect to sense amplifiers

Page 32: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.32

DRAM basics

• Row read is destructive

• Sequence– Read row into SRAM from dynamic

memory(>1000 bits) – Select word (<64 bits)– Write Word into row (writing)– Repeat till done with row– WRITE back row into dynamic memory

Page 33: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.33

DRAM timing• row and column addresses muxed

• row and column Strobes for timing

Page 34: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.34

Increase DRAM bandwidth

• Burst Mode– aka page mode, nibble mode, fast page mode

• Synchronous DRAM (SDRAM)

• DDR SDRAM– DDR1– DDR2– DDR3

Page 35: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.35

DDR SDRAM(Dual Data Rate Synchronous DRAM)

Page 36: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.36

Burst mode

• burst mode– save most recently accessed row (“page”)– only need column row + CAS to access within page

• most DDR SDRAMs: multiple rows can be open– address counter in each row for sequential

accesses– only need CAS (DRAM) or bus clock (SDRAM) for

sequential accesses

Page 37: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.37

Configuration parameters

Parameters for typical DRAM chips used in a 64-bit module

Page 38: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.38

DRAM timing

Page 39: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.39

Physical memory system

Page 40: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.40

Basic memory model

• assume that n processors– each make 1 request per Tc to one of m memories

• B(n,m)– number of successes

• Tc– memory cycle time to the memory

• one processor making n requests per Tc – behaves as n processors making 1 request per Tc

Page 41: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.41

Achieved vs. offered bandwidth

• offered request rate– rate at which processor(s) would make requests if

memory had unlimited bandwidth and no contention

Page 42: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.42

Basic terms

• B = B(m,n) or B(m)– number of requests that succeed each Tc (=

average number of busy modules)– B: bandwidth normalized to Tc

• Ts: more generalized term for service time – Tc = Ts

• BW: achieved bandwidth– in requests serviced per second– BW = B / Ts = B(m,n)/ Ts

Page 43: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.43

Modeling + evaluation methodology

• relevant physical parameters for memory– word size– module size– number of modules– cycle time Tc (=Ts)

• find the offered Bandwidth– number of requests/Ts

• find the bottleneck– performance limited by most restrictive service point

Page 44: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.44

Strecker’s model: compute B(m,n)

• model description– each processor generates 1 reference per cycle– requests randomly/uniformly distributed over modules– any busy module serves 1 request– all unserviced requests are dropped each cycle– assume there are no queues

• B(m,n) = m[1 - (1 - 1/m)n]• relative Performance Prel = B(m,n) / n

Page 45: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.45

Deriving Strecker’s model

• Prob[given processor not reference module]= (1 – 1/m)

• Prob[no processor references module] = P[idle]

= (1 – 1/m)n

• Prob[module busy]= 1 - (1 – 1/m)n

• average number of busy modules is B(m,n)• B(m,n) = m[1 - (1 - 1/m)n]

Page 46: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.46

Example 1

• 2 dual core processor dice share memory – Ts = 24 ns

• each die has 2 processors – sharing 4MB L2– miss rate is 0.001 misses reference– each processor makes 3 references/cycle @ 4 GHz

2 x 2 x 3 x 0.001 =0.012 refs/cycTs = 4 x 24 cyclesn = 1.152 processor requests / Ts; if m= 4success rate B(m,n) = B(4,1.152) = 0.81

Relative Performance = B/n = .81/1.152 =0.7

Page 47: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.47

Example 2

• 8-way interleaved associative data cache

• processor issues 2LD/ST per cycle– each processor: data reference per cycle = 0.6– n = 2 ; m = 8– B(m,n) = B(8,1.2) = 1.18

• Relative Performance = B/n = 1.18/1.2 = 0.98

Page 48: Soc 4.1 Chapter 4 Memory Design: SOC and Board-Based Systems Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

soc 4.48

Summary• cache

– performance, cache partitioning, multi-level cache

• memory chip technology– on-die or off die

• static versus dynamic:– SRAM versus DRAM

• access protocol: talking to memory– synchronous vs asynchronous DRAMs

• simple memory performance model– Strecker’s model for memory banks