on-chip cache analysis a parameterized cache implementation for a system-on-chip risc cpu

On-Chip Cache Analysis

A Parameterized Cache Implementation for a System-on-Chip RISC CPU

Presentation Outline

Informal Introduction Underpin Design – xr16 Cache Design Issue Implementation Details Results & Conclusion Future Work Questions

Informal Introduction

Field Programmable Gate Array (FPGAs) Verilog HDL System-on-Chip (SoC) Reduced Instruction Set Computer (RISC) Caches Project Theme

Underpin Design – xr16

Classical pipelined RISC Big-Endian, Von-Numen Architecture Sixteen 16-bit registers Forty Two Instructions (16-bit) Result Forwarding, Branch Annulments,

Interlocked instructions

Underpin Design – xr16 (cont’d)

Internal and external Buses (CPU clocked) Pipelined Memory Interface Single-cycle read, 3-cycle write DMA and Interrupt Handling Support Ported Compiler and Assembler

Block Diagram

Datapath

Memory Preferences

RAM Interface

Cache Design Issues

Cache Size * Line Size Fetch Algorithm Placement Policy * Replacement Policy * Split vs. Unified Cache

Cache Design Issues (cont’d)

Write Back Strategy * Write Allocate Policy * Blocking vs. Non-Blocking Pipelined Transactions Virtually addressed Caches Multilevel Caches

Cache Design Issues (cont’d)Cache Size 32 – 256K Data Bits

Placement Policy

Direct Mapped, Set Associative,

Fully Associative

Replacement Policy

FIFO, Random*

Write Back Strategy

Write Back, Write Through

Write Allocate Policy

Write Allocate, Write No Allocate

Implementation Details

Configurable Parameters

Cache Size Placement Strategy Write Back Policy Write Allocate Policy Replacement Policy

Implementation Details (cont’d)

1. Miss Read Replacement NOT Required

Let the memory operation complete and place

fetched data from memory in cache.

2. Miss Read Replacement RequiredInitiate a write memory operation and write back

the set to be replaced. Initiate read operation for

desired data.

3. Miss Write No AllocateLet the memory operation complete and do nothing else.

4. Miss Write Yes Allocate WriteThroughLet the memory operation complete and place the new

data in cache.

5. Miss Write Yes Allocate WriteBack

Replacement NOT RequiredCancel memory operation and only update the cache,

mark the data dirty.

6. Miss Write Yes Allocate WriteBack

Replacement RequiredInstead of writing the data that caused the write miss,

write back the set that is to be replaced and update the

cache with data that caused the miss.

7. Hit ReadCancel memory operation and provide data for either

instruction fetch or data load instruction.

8. Hit Write WriteThroughLet the memory operation complete and update the cache

when memory operation completes.

9. Hit Write WriteBackCancel the memory operation and update the cache.

1. Read Hit

2. Write Hit

3. Read Miss (rep)

4. Read Miss (no rep)

5. Write Miss (rep)

6. Write Miss (no rep)

1, 4 32 5,6

Results & Conclusion

Proof of Concept Rigid Design Parameters R&D Options Architecture Innovation

Future Work

LRU Implementation Victim Cache Buffer Split Caches Level 2 Cache Pipeline Enrichment Multiprocessor Support

Questions

on-chip cache analysis a parameterized cache implementation for a system-on-chip risc cpu

write memory operation

unified cache slide

assembler slide

cache design issues

design xr16 contd internal

memory operation complete

cancel memory operation

chip risc cpu slide

Documents

exploiting locality to ameliorate packet queue contention...

internal memory -...

parameterized complexity news -...

network on chip cache coherency final presentation – part...

stimuluscache : boosting performance of chip...

load value approximation: approaching the ideal memory...

a comparative analysis of shared cache management ... · a...

chip errata for the i.mx 6ull - nxp · perform all...

rm7000™ microprocessor with on-chip secondary cache …

slide 0 fmcad 2004 a simple method for parameterized...

exploiting the cache capacity of a single-chip multicore...

cache memory principles - victor murray · 2013-06-24 ·...

transparent dynamic binding with fault-tolerant cache...

constraint normalization and parameterized caching for...

c-amte: a location mechanism for flexible cache management...

the parameterized complexity of cascading portfolio...

sos: a software-oriented distributed shared cache management...

dynamic cache clustering for chip...

lecture2-intro2cpoellab/teaching/cse30341/lecture2... ·...

cache coherence protocols for chip multiprocessors -...