caching ii andreas klappenecker cpsc321 computer architecture
Post on 21-Dec-2015
214 views
TRANSCRIPT
Caching II
Andreas KlappeneckerCPSC321 Computer
Architecture
Verilog Questions & Answers
Verilog Q & A How is the xor instruction encoded?
R-format instruction, function field Ox26 See [PH] page A-59
What is the purpose of Idealmem.v? It models the memory dmeminit.v initializes data memory imeminit.v initializes instruction memory
Verilog Q&A How do I specify delays?
`define DEL 10
begin
a <= #(`DEL) b;
c <= #(`DEL) d;
end
Delays can be inserted anywhere in an assignment
Delaysmodule iab;
integer i, j;
initial begin
i = 3;
j = 4;
begin
#1 i = #1 j;
#1 j = #1 i;
end
end
endmodule
module iab;
integer i, j;
initial begin
i = 3;
j = 4;
begin
#1 i = #1 j;
#1 j = #1 i;
end
end
endmodule
Simulation starts:
@time 0: i=3, j=4
Simulation continues until first delay #1 and waits until time 1.
@time 1, j is sampled
@time 2, assign 4 to i
continue w/ next stmt
@time 3, i is sampled
@time 4, assign 4 to j
Delaysmodule ianb;
integer i, j;
initial begin
i = 3;
j = 4;
begin
i <= #1 j;
j <= #1 i;
end
end
endmodule
module ianb;
integer i, j;
initial begin
i = 3;
j = 4;
begin
i <= #1 j;
j <= #1 i;
end
end
endmodule
@time 0: i=3, j=4
both non-blocking assignments finish at time 0
[intra-assignments delays do not delay the execution of the statement]
sample j and schedule to assign to i at time 1
sample i and schedule to assign to j
@time 1: i = 4, j = 3
Delays
Hint: Using unit delays simplifies debugging
It allows you to find out which signal depends on which
Do not code in the form #1, rather use
define ‘foo_del 1 // Change later a <= #(‘foo_del) b;
Clock
module m555 (CLK);
parameter STime = 0,Ton = 50,Toff = 50,Tcc=Ton+Toff;
output CLK;
reg CLK;
initial begin
#STime CLK = 0;
end
always begin
#Toff CLK = ~CLK;
#Ton CLK = ~CLK;
end
endmodule
Project For jal and jr, the datapath of the
book is not enough You need more control signals for
ALUop, so there is no point to stick to the way it is done in the book
Report
Include some a table explaining yourcontrol signals, e.g.,
Caching
Memory Users want large and fast memories
SRAM is too expensive for main memory DRAM is too slow for many purposes Compromised: Build a memory hierarchy
CPU
Level n
Level 2
Level 1
Levels in thememory hierarchy
Increasing distance from the CPU in
access time
Size of the memory at each level
Locality
Temporal locality A referenced item will be again
referenced soon Spatial locality
nearby data will be referenced soon
Mapping: address modulo the number of blocks in the cache, x -> x mod B
Direct Mapped Cache
00001 00101 01001 01101 10001 10101 11001 11101
000
Cache
Memory
001
01
001
11
001
011
101
11
Cache with 1024=210 words tag from cache is compared against upper portion of
the address If tag=upper 20 bits and valid bit is set, then we
have a cache hit otherwise it is a cache miss
What kind of locality are we taking advantage of?
Direct Mapped Cache
Address (showing bit positions)
20 10
Byteoffset
Valid Tag DataIndex
0
1
2
1021
1022
1023
Tag
Index
Hit Data
20 32
31 30 13 12 11 2 1 0
Taking advantage of spatial locality:
Direct Mapped Cache
Address (showing bit positions)
16 12 Byteoffset
V Tag Data
Hit Data
16 32
4Kentries
16 bits 128 bits
Mux
32 32 32
2
32
Block offsetIndex
Tag
31 16 15 4 32 1 0
Read hits this is what we want!
Read misses stall the CPU, fetch block from memory, deliver to cache,
restart Write hits:
can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later)
Write misses: read the entire block into the cache, then write the word
Cache Hits and Misses
What Block Size?
A large block size reduces cache misses Cache miss penalty increases We need to balance these two
constraints Next time:
How can we measure cache performance? How can we improve cache performance?