csc 252: computer organization spring 2020: lecture 17 · 2020-03-31 · csc 252: computer...

100
CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science University of Rochester

Upload: others

Post on 18-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

CSC 252: Computer Organization Spring 2020: Lecture 17

Instructor: Yuhao Zhu

Department of Computer ScienceUniversity of Rochester

Page 2: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Announcement

!2

• Recall: 75 full score + 20 extra credit•Max: 90•Min: 11.3•Median: 58.25•Mean: 57.71• Standard Deviation: 16.23

Page 3: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Announcement

!3

1

5

9

13

17

21

85+ 75+ 65+ 55+ 45+ 35+ 25+ 0-25

•Mid-term grade distribution

Page 4: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Announcement

!4

Page 5: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Announcement• Point distribution:

• 8% per lab: 40% in total • 22% for mid-term • 38% for final

!4

Page 6: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Announcement• Point distribution:

• 8% per lab: 40% in total • 22% for mid-term • 38% for final

• If you can’t make an office hour (mine or TAs), feel free to email and/or schedule a different time at your convenience

!4

Page 7: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Illustrations

!5

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Memory(big but slow)

CPU

Page 8: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Illustrations

!6

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

8 9 14 3

Memory(big but slow)

Cache(small but fast)

CPU

Page 9: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Illustrations

!6

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

8 9 14 3

Data in address b is neededRequest Dataat Address 14

Memory(big but slow)

Cache(small but fast)

CPU

Page 10: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Illustrations

!6

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

8 9 14 3

Data in address b is neededRequest Dataat Address 14

14 Address b is in cache: Hit!

Memory(big but slow)

Cache(small but fast)

CPU

Page 11: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Illustrations

!7

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

8 9 14 3

Memory(big but slow)

Cache(small but fast)

CPU

Page 12: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Illustrations

!7

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

8 9 14 3

Request dataat Address 12

Memory(big but slow)

Cache(small but fast)

CPU

Page 13: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Illustrations

!7

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

8 9 14 3

Request dataat Address 12

Data in address b is needed

Address b is not in cache: Miss!

Memory(big but slow)

Cache(small but fast)

CPU

Page 14: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Illustrations

!7

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

8 9 14 3

Request dataat Address 12

Address b is fetched frommemoryRequest: 12

Data in address b is needed

Address b is not in cache: Miss!

Memory(big but slow)

Cache(small but fast)

CPU

Page 15: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Illustrations

!7

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

8 9 14 3

Request dataat Address 12

Address b is fetched frommemoryRequest: 12

12

Data in address b is needed

Address b is not in cache: Miss!

Memory(big but slow)

Cache(small but fast)

CPU

Page 16: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Illustrations

!7

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

8 9 14 3

Request dataat Address 12

Address b is fetched frommemoryRequest: 12

12

12

Data in address b is needed

Address b is not in cache: Miss!

Memory(big but slow)

Cache(small but fast)

CPU

Page 17: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Illustrations

!7

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

8 9 14 3

Request dataat Address 12

Address b is fetched frommemoryRequest: 12

12

12

Address b is stored in cache

Data in address b is needed

Address b is not in cache: Miss!

Memory(big but slow)

Cache(small but fast)

CPU

Page 18: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Fully Associative Cache

• Every memory location can be mapped to any cache line in the cache.

!8

0xEF 1000 0x06 10100xAC 1001 0x70 1101

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Content Valid? Tag

Page 19: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Fully Associative Cache

• Every memory location can be mapped to any cache line in the cache.

• Given a request to address A from the CPU, detecting cache hit/miss requires:• Comparing address A with all four tags in

the cache (a.k.a., associative search)

!8

0xEF 1000 0x06 10100xAC 1001 0x70 1101

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Content Valid? Tag

Page 20: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Fully Associative Cache

• Every memory location can be mapped to any cache line in the cache.

• Given a request to address A from the CPU, detecting cache hit/miss requires:• Comparing address A with all four tags in

the cache (a.k.a., associative search)• A cache line: content + valid bit + tag bits• Valid bit + tag bits are “overhead”• Content is what you really want to store• But we need valid and tag bits to correctly

access the cache

!8

0xEF 1000 0x06 10100xAC 1001 0x70 1101

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Content Valid? Tag

Page 21: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

2-Way Associative Cache

• 4 cache lines are organized into two sets; each set has 2 cache lines (i.e., 2 ways)

!9

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Set 1

Set 00

1

0xEF 100

0xAC 100

0x06

0x70

101

110

Content Valid? Tag

Page 22: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

2-Way Associative Cache

• 4 cache lines are organized into two sets; each set has 2 cache lines (i.e., 2 ways)

• Even address go to first set and odd addresses go to the second set

!9

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Set 1

Set 00

1

0xEF 100

0xAC 100

0x06

0x70

101

110

Content Valid? Tag

Page 23: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

2-Way Associative Cache

• 4 cache lines are organized into two sets; each set has 2 cache lines (i.e., 2 ways)

• Even address go to first set and odd addresses go to the second set

• Each address can be mapped to either cache line in the same set• Using the LSB to find the set (i.e., odd vs.

even)• Tag now stores the higher 3 bits instead of

the entire address

!9

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Set 1

Set 00

1

0xEF 100

0xAC 100

0x06

0x70

101

110

Content Valid? Tag

Page 24: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

2-Way Associative Cache

!10

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Set 1

Set 00

1

0xEF 100

0xAC 100

0x06

0x70

101

110

Content Valid? Tag

• Given a request to address, say 1011, from the CPU, detecting cache hit/miss requires:

Page 25: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

2-Way Associative Cache

!10

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Set 1

Set 00

1

0xEF 100

0xAC 100

0x06

0x70

101

110

Content Valid? Tag

• Given a request to address, say 1011, from the CPU, detecting cache hit/miss requires:• Using the LSB to index into the cache and

find the corresponding set, in this case set 1

Page 26: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

2-Way Associative Cache

!10

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Set 1

Set 00

1

0xEF 100

0xAC 100

0x06

0x70

101

110

Content Valid? Tag

• Given a request to address, say 1011, from the CPU, detecting cache hit/miss requires:• Using the LSB to index into the cache and

find the corresponding set, in this case set 1• Then do an associative search in that set,

i.e., compare the highest 3 bits 101 with both tags in set 1

Page 27: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

2-Way Associative Cache

!10

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Set 1

Set 00

1

0xEF 100

0xAC 100

0x06

0x70

101

110

Content Valid? Tag

• Given a request to address, say 1011, from the CPU, detecting cache hit/miss requires:• Using the LSB to index into the cache and

find the corresponding set, in this case set 1• Then do an associative search in that set,

i.e., compare the highest 3 bits 101 with both tags in set 1

• Only two comparisons required

Page 28: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Direct-Mapped (1-way Associative) Cache

• 4 cache lines are organized into four sets• Each memory localization can only be

mapped to one set• Using the 2 LSBs to find the set• Tag now stores the higher 2 bits

!11

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Content Valid? Tag

10

1010

0xEF

0x060xAC

00011011

Page 29: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Direct-Mapped (1-way Associative) Cache

!12

Content Valid? Tag

• Given a request to address, say 1101, from the CPU, detecting cache hit/miss requires:

10

1010

0xEF

0x060xAC

00011011

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Page 30: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Direct-Mapped (1-way Associative) Cache

!12

Content Valid? Tag

• Given a request to address, say 1101, from the CPU, detecting cache hit/miss requires:• Using the 2 LSBs to index into the cache and

find the set, in this case set 01

10

1010

0xEF

0x060xAC

00011011

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Page 31: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Direct-Mapped (1-way Associative) Cache

!12

Content Valid? Tag

• Given a request to address, say 1101, from the CPU, detecting cache hit/miss requires:• Using the 2 LSBs to index into the cache and

find the set, in this case set 01• Then do an associative search in that set,

i.e., compare the highest 2 bits 11 in the address with the tag in set 01 —> miss

10

1010

0xEF

0x060xAC

00011011

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Page 32: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Direct-Mapped (1-way Associative) Cache

!12

Content Valid? Tag

• Given a request to address, say 1101, from the CPU, detecting cache hit/miss requires:• Using the 2 LSBs to index into the cache and

find the set, in this case set 01• Then do an associative search in that set,

i.e., compare the highest 2 bits 11 in the address with the tag in set 01 —> miss

• Only one comparison required

10

1010

0xEF

0x060xAC

00011011

a

0xEF0xAC0x06

0x70

0000000100100011010001010110011110001001101010111100110111101111

Page 33: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Associative verses Direct Mapped Trade-offs

!13

Page 34: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Associative verses Direct Mapped Trade-offs• Direct-Mapped cache

• Generally lower hit rate • Simpler, Faster

!13

Page 35: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Associative verses Direct Mapped Trade-offs• Direct-Mapped cache

• Generally lower hit rate • Simpler, Faster

!13

00011011

10

1010

10

abcd

1111

addr

addr[1:0] = Hit?

addr[3:2]

Page 36: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Associative verses Direct Mapped Trade-offs• Direct-Mapped cache

• Generally lower hit rate • Simpler, Faster

• Associative cache• Generally higher hit rate. Better utilization of cache resources • Slower and higher power consumption. Why?

!13

00011011

10

1010

10

abcd

1111

addr

addr[1:0] = Hit?

addr[3:2]

01

101 100101 100

ab

cd

11

11

Page 37: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Associative verses Direct Mapped Trade-offs• Direct-Mapped cache

• Generally lower hit rate • Simpler, Faster

• Associative cache• Generally higher hit rate. Better utilization of cache resources • Slower and higher power consumption. Why?

!13

00011011

10

1010

10

abcd

1111

addr

addr[1:0] = Hit?

addr[3:2]

01

101 100101 100

ab

cd

11

11

addr

addr[0]

Page 38: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Associative verses Direct Mapped Trade-offs• Direct-Mapped cache

• Generally lower hit rate • Simpler, Faster

• Associative cache• Generally higher hit rate. Better utilization of cache resources • Slower and higher power consumption. Why?

!13

00011011

10

1010

10

abcd

1111

addr

addr[1:0] = Hit?

addr[3:2]

01

101 100101 100

ab

cd

11

11

addr

addr[0]

=

addr[3:1]

=

Page 39: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Associative verses Direct Mapped Trade-offs• Direct-Mapped cache

• Generally lower hit rate • Simpler, Faster

• Associative cache• Generally higher hit rate. Better utilization of cache resources • Slower and higher power consumption. Why?

!13

00011011

10

1010

10

abcd

1111

addr

addr[1:0] = Hit?

addr[3:2]

01

101 100101 100

ab

cd

11

11

addr

addr[0]

=

addr[3:1]

=

Hit?

Or

Page 40: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Associative verses Direct Mapped Trade-offs

!14

Miss rate versus cache size on the Integer portion of SPEC CPU2000

Page 41: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Organization• Finding a name in a roster• If the roster is completely unorganized• Need to compare the name with all the names in the roster• Same as a fully-associative cache

• If the roster is ordered by last name, and within the same last name different first names are unordered• First find the last name group• Then compare the first name with all the first names in the

same group• Same as a set-associative cache

!15

Page 42: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Access Summary (So far…)• Assuming b bits in a memory address• The b bits are split into two halves:

• Lower s bits used as index to find a set. Total sets S = 2s • The higher (b - s) bits are used for the tag

• Associativity n (i.e., the number of ways in a cache set) is independent of the the split between index and tag

!16

tag index

0sbMemory Address

Page 43: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Locality again• So far: temporal locality•What about spatial?• Idea: Each cache location (cache line) store multiple bytes

!17

Page 44: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache-Line Size of 2

!18

a

abcd

0000000100100011010001010110011110001001101010111100110111101111

MemoryCache

00011011

addr

addr[1:0] = Hit?

addr[3:2]

Page 45: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache-Line Size of 2

!18

a

abcd

0000000100100011010001010110011110001001101010111100110111101111

MemoryCache

00011011

addr

addr[1:0] = Hit?

addr[3:2]

Page 46: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache-Line Size of 2

• Read 1000

!18

a

abcd

0000000100100011010001010110011110001001101010111100110111101111

MemoryCache

00011011

a

addr

addr[1:0] = Hit?

addr[3:2]

b

Page 47: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache-Line Size of 2

• Read 1000• Read 1001 (Hit!)

!18

a

abcd

0000000100100011010001010110011110001001101010111100110111101111

MemoryCache

00011011

a

addr

addr[1:0] = Hit?

addr[3:2]

b

Page 48: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache-Line Size of 2

• Read 1000• Read 1001 (Hit!)• Read 1010

!18

a

abcd

0000000100100011010001010110011110001001101010111100110111101111

MemoryCache

00011011

ac

addr

addr[1:0] = Hit?

addr[3:2]

db

Page 49: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache-Line Size of 2

• Read 1000• Read 1001 (Hit!)• Read 1010• Read 1011 (Hit!)

!18

a

abcd

0000000100100011010001010110011110001001101010111100110111101111

MemoryCache

00011011

ac

addr

addr[1:0] = Hit?

addr[3:2]

db

Page 50: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Cache Access Summary• Assuming b bits in a memory address• The b bits are split into three fields:

• Lower l bits are used for byte offset within a cache line. Cache line size L = 2l

• Next s bits used as index to find a set. Total sets S = 2s • The higher (b - l - s) bits are used for the tag

• Associativity n is independent of the the split between index and tag

!19

tag index

0l+sbMemory Address offset

l

Page 51: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Reads

!2033

Page 52: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Reads• Read miss: Put into cache

!2033

Page 53: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Reads• Read miss: Put into cache

• Any reason not to put into cache?

!2033

Page 54: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Reads• Read miss: Put into cache

• Any reason not to put into cache?• Read hit: Nothing special. Enjoy the hit!

!2033

Page 55: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Hit)• Intricacy: data value is modified!• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?• Write through: At the time the write happens

!21

Page 56: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Hit)• Intricacy: data value is modified!• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?• Write through: At the time the write happens• Write back: When the cache line is evicted

!21

Page 57: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Hit)• Intricacy: data value is modified!• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?• Write through: At the time the write happens• Write back: When the cache line is evicted

• Write-back• + Can consolidate multiple writes to the same block before eviction. Potentially

saves bandwidth between cache and memory + saves energy• - Need a bit in the tag store indicating the block is “dirty/modified”

!21

Page 58: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Hit)• Intricacy: data value is modified!• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?• Write through: At the time the write happens• Write back: When the cache line is evicted

• Write-back• + Can consolidate multiple writes to the same block before eviction. Potentially

saves bandwidth between cache and memory + saves energy• - Need a bit in the tag store indicating the block is “dirty/modified”

!21

Page 59: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Hit)• Intricacy: data value is modified!• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?• Write through: At the time the write happens• Write back: When the cache line is evicted

• Write-back• + Can consolidate multiple writes to the same block before eviction. Potentially

saves bandwidth between cache and memory + saves energy• - Need a bit in the tag store indicating the block is “dirty/modified”

• Write-through

!21

Page 60: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Hit)• Intricacy: data value is modified!• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?• Write through: At the time the write happens• Write back: When the cache line is evicted

• Write-back• + Can consolidate multiple writes to the same block before eviction. Potentially

saves bandwidth between cache and memory + saves energy• - Need a bit in the tag store indicating the block is “dirty/modified”

• Write-through• + Simpler

!21

Page 61: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Hit)• Intricacy: data value is modified!• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?• Write through: At the time the write happens• Write back: When the cache line is evicted

• Write-back• + Can consolidate multiple writes to the same block before eviction. Potentially

saves bandwidth between cache and memory + saves energy• - Need a bit in the tag store indicating the block is “dirty/modified”

• Write-through• + Simpler• + Memory is up to date

!21

Page 62: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Hit)• Intricacy: data value is modified!• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?• Write through: At the time the write happens• Write back: When the cache line is evicted

• Write-back• + Can consolidate multiple writes to the same block before eviction. Potentially

saves bandwidth between cache and memory + saves energy• - Need a bit in the tag store indicating the block is “dirty/modified”

• Write-through• + Simpler• + Memory is up to date• - More bandwidth intensive; no coalescing of writes

!21

Page 63: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Hit)• Intricacy: data value is modified!• Implication: value in cache will be different from that in memory!

• When do we write the modified data in a cache to the next level?• Write through: At the time the write happens• Write back: When the cache line is evicted

• Write-back• + Can consolidate multiple writes to the same block before eviction. Potentially

saves bandwidth between cache and memory + saves energy• - Need a bit in the tag store indicating the block is “dirty/modified”

• Write-through• + Simpler• + Memory is up to date• - More bandwidth intensive; no coalescing of writes• - Requires transfer of the whole cache line (although only one byte might have

been modified)!21

Page 64: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Miss)• Do we allocate a cache line on a write miss?

• Write-allocate: Allocate on write miss• Non-Write-Allocate: No-allocate on write miss

• Allocate on write miss

!22

Page 65: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Miss)• Do we allocate a cache line on a write miss?

• Write-allocate: Allocate on write miss• Non-Write-Allocate: No-allocate on write miss

• Allocate on write miss• + Can consolidate writes instead of writing each of them

individually to memory

!22

Page 66: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Miss)• Do we allocate a cache line on a write miss?

• Write-allocate: Allocate on write miss• Non-Write-Allocate: No-allocate on write miss

• Allocate on write miss• + Can consolidate writes instead of writing each of them

individually to memory• + Simpler because write misses can be treated the same way

as read misses

!22

Page 67: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Handling Writes (Miss)• Do we allocate a cache line on a write miss?

• Write-allocate: Allocate on write miss• Non-Write-Allocate: No-allocate on write miss

• Allocate on write miss• + Can consolidate writes instead of writing each of them

individually to memory• + Simpler because write misses can be treated the same way

as read misses

• Non-allocate• + Conserves cache space if locality of writes is low (potentially

better cache hit rate)

!22

Page 68: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Instruction vs. Data Caches• Separate or Unified?

!23

Page 69: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Instruction vs. Data Caches• Separate or Unified?

• Unified:

!23

Page 70: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Instruction vs. Data Caches• Separate or Unified?

• Unified:• + Dynamic sharing of cache space: no overprovisioning that might

happen with static partitioning (i.e., split Inst and Data caches)

!23

Page 71: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Instruction vs. Data Caches• Separate or Unified?

• Unified:• + Dynamic sharing of cache space: no overprovisioning that might

happen with static partitioning (i.e., split Inst and Data caches)• - Instructions and data can thrash each other (i.e., no guaranteed

space for either)

!23

Page 72: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Instruction vs. Data Caches• Separate or Unified?

• Unified:• + Dynamic sharing of cache space: no overprovisioning that might

happen with static partitioning (i.e., split Inst and Data caches)• - Instructions and data can thrash each other (i.e., no guaranteed

space for either)• - Inst and Data are accessed in different places in the pipeline.

Where do we place the unified cache for fast access?

!23

Page 73: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Instruction vs. Data Caches• Separate or Unified?

• Unified:• + Dynamic sharing of cache space: no overprovisioning that might

happen with static partitioning (i.e., split Inst and Data caches)• - Instructions and data can thrash each other (i.e., no guaranteed

space for either)• - Inst and Data are accessed in different places in the pipeline.

Where do we place the unified cache for fast access?

• First level caches are almost always split • Mainly for the last reason above

• Second and higher levels are almost always unified

!23

Page 74: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Eviction/Replacement Policy

•Which cache line should be replaced?

!24

Page 75: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Eviction/Replacement Policy

•Which cache line should be replaced?• Direct mapped? Only one place!

!24

Page 76: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Eviction/Replacement Policy

•Which cache line should be replaced?• Direct mapped? Only one place!• Associative caches? Multiple places!

!24

Page 77: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Eviction/Replacement Policy

•Which cache line should be replaced?• Direct mapped? Only one place!• Associative caches? Multiple places!

• For associative cache:

!24

Page 78: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Eviction/Replacement Policy

•Which cache line should be replaced?• Direct mapped? Only one place!• Associative caches? Multiple places!

• For associative cache:• Any invalid cache line first

!24

Page 79: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Eviction/Replacement Policy

•Which cache line should be replaced?• Direct mapped? Only one place!• Associative caches? Multiple places!

• For associative cache:• Any invalid cache line first• If all are valid, consult the replacement policy

!24

Page 80: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Eviction/Replacement Policy

•Which cache line should be replaced?• Direct mapped? Only one place!• Associative caches? Multiple places!

• For associative cache:• Any invalid cache line first• If all are valid, consult the replacement policy• Randomly pick one???

!24

Page 81: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Eviction/Replacement Policy

•Which cache line should be replaced?• Direct mapped? Only one place!• Associative caches? Multiple places!

• For associative cache:• Any invalid cache line first• If all are valid, consult the replacement policy• Randomly pick one???• Ideally: Replace the cache line that’s least likely going to be

used again

!24

Page 82: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Eviction/Replacement Policy

•Which cache line should be replaced?• Direct mapped? Only one place!• Associative caches? Multiple places!

• For associative cache:• Any invalid cache line first• If all are valid, consult the replacement policy• Randomly pick one???• Ideally: Replace the cache line that’s least likely going to be

used again• Approximation: Least recently used (LRU)

!24

Page 83: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Idea: Evict the least recently accessed block• Challenge: Need to keep track of access ordering of blocks• Question: 2-way set associative cache:

• What do you need to implement LRU perfectly? One bit?

!25

0 1Cache Lines

LRU index (1-bit)

Page 84: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Idea: Evict the least recently accessed block• Challenge: Need to keep track of access ordering of blocks• Question: 2-way set associative cache:

• What do you need to implement LRU perfectly? One bit?

!25

0 1Cache Lines

LRU index (1-bit)

Address stream:• Hit on 0• Hit on 1• Miss, evict 0

Page 85: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Idea: Evict the least recently accessed block• Challenge: Need to keep track of access ordering of blocks• Question: 2-way set associative cache:

• What do you need to implement LRU perfectly? One bit?

!25

0 1Cache Lines

LRU index (1-bit) 1

Address stream:• Hit on 0• Hit on 1• Miss, evict 0

Page 86: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Idea: Evict the least recently accessed block• Challenge: Need to keep track of access ordering of blocks• Question: 2-way set associative cache:

• What do you need to implement LRU perfectly? One bit?

!25

0 1Cache Lines

LRU index (1-bit) 10

Address stream:• Hit on 0• Hit on 1• Miss, evict 0

Page 87: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Idea: Evict the least recently accessed block• Challenge: Need to keep track of access ordering of blocks• Question: 2-way set associative cache:

• What do you need to implement LRU perfectly? One bit?

!25

0 1Cache Lines

LRU index (1-bit) 10

Address stream:• Hit on 0• Hit on 1• Miss, evict 0

Page 88: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Idea: Evict the least recently accessed block• Challenge: Need to keep track of access ordering of blocks• Question: 2-way set associative cache:

• What do you need to implement LRU perfectly? One bit?

!25

0 1Cache Lines

LRU index (1-bit) 10

Address stream:• Hit on 0• Hit on 1• Miss, evict 0

1

Page 89: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1

Page 90: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Question: 4-way set associative cache:

• What do you need to implement LRU perfectly?• Will the same mechanism work?

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1

Page 91: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Question: 4-way set associative cache:

• What do you need to implement LRU perfectly?• Will the same mechanism work?

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1

Page 92: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Question: 4-way set associative cache:

• What do you need to implement LRU perfectly?• Will the same mechanism work?

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1

Page 93: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Question: 4-way set associative cache:

• What do you need to implement LRU perfectly?• Will the same mechanism work?

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1

Page 94: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Question: 4-way set associative cache:

• What do you need to implement LRU perfectly?• Will the same mechanism work?

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1

Page 95: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Question: 4-way set associative cache:

• What do you need to implement LRU perfectly?• Will the same mechanism work?

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1What to update now???

Page 96: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Question: 4-way set associative cache:

• What do you need to implement LRU perfectly?• Will the same mechanism work?• Essentially have to track the ordering of all cache lines

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1What to update now???

Page 97: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Question: 4-way set associative cache:

• What do you need to implement LRU perfectly?• Will the same mechanism work?• Essentially have to track the ordering of all cache lines• How many possible orderings are there?

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1What to update now???

Page 98: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Question: 4-way set associative cache:

• What do you need to implement LRU perfectly?• Will the same mechanism work?• Essentially have to track the ordering of all cache lines• How many possible orderings are there?• What are the hardware structures needed?

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1What to update now???

Page 99: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Question: 4-way set associative cache:

• What do you need to implement LRU perfectly?• Will the same mechanism work?• Essentially have to track the ordering of all cache lines• How many possible orderings are there?• What are the hardware structures needed?• In reality, true LRU is never implemented. Too complex.

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1What to update now???

Page 100: CSC 252: Computer Organization Spring 2020: Lecture 17 · 2020-03-31 · CSC 252: Computer Organization Spring 2020: Lecture 17 Instructor: Yuhao Zhu Department of Computer Science

Carnegie Mellon

Implementing LRU• Question: 4-way set associative cache:

• What do you need to implement LRU perfectly?• Will the same mechanism work?• Essentially have to track the ordering of all cache lines• How many possible orderings are there?• What are the hardware structures needed?• In reality, true LRU is never implemented. Too complex.• Google Pseudo-LRU

!26

0 1 2 3Cache LinesLRU index (2 bits) 1

Address stream:• Hit on 0• Hit on 2• Hit on 3• Miss, evict 1What to update now???