august 8 th , 2011 kevan thompson

31
August 8 th , 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache

Upload: zelda-rutledge

Post on 01-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Creating a Scalable Coherent L2 Cache. August 8 th , 2011 Kevan Thompson. Outline. Motivation Cache Background System Overview Methodology Progress Future Work. 2. Motivation. Goal Create a configurable shared Last Level Cache for the use in the PolyBlaze system. 3. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: August 8 th , 2011 Kevan Thompson

August 8th, 2011Kevan Thompson

Creating a Scalable Coherent L2 Cache

Page 2: August 8 th , 2011 Kevan Thompson

Motivation

Cache Background

System Overview

Methodology

Progress

Future Work

Outline

2

Page 3: August 8 th , 2011 Kevan Thompson

Goal

Create a configurable shared Last Level Cache for the use in the PolyBlaze system

Motivation

3

Page 4: August 8 th , 2011 Kevan Thompson

Introduction

4

Zia

Eric

Kevan

Page 5: August 8 th , 2011 Kevan Thompson

In modern systems, processors out perform main memory, creating a bottleneck

This problem is only exacerbated as more cores contend for the memory

This problem is reduced if each processor maintains a local copy of the data

Cache Background

5

Page 6: August 8 th , 2011 Kevan Thompson

A cache is a small amount of memory on the same die as the processor

The cache is capable of providing a lower latency and a higher throughput than the main memory

Systems may include multiple cache levels

The smallest and most local cache is the L1 cache. The next level cache is the L2, etc

Caches

6

Page 7: August 8 th , 2011 Kevan Thompson

Shared Last Level Cache

Acts as a common location for data

Can be used to maintain cache coherency between processors

Does not exist in current MicroBlaze system

We will design our own shared L2 Cache to maintain cache coherency

7

Page 8: August 8 th , 2011 Kevan Thompson

Cache Speeds

In typical systems:

An L1 cache is very fast (1 or 2 cycles )

An L2 cache is slower (10’s of cycles)

Main memory is very slow (100’s of cycles)

8

Page 9: August 8 th , 2011 Kevan Thompson

Cache Speeds

In our system we expect :

The L1 cache to be very fast (1 or 2 cycles )

The L2 cache to be about (10 of cycles)

Main memory to be faster (10’s of cycles)

In order to model the memory bottleneck of a much faster system we’ll need to stall the Main Memory

9

Page 10: August 8 th , 2011 Kevan Thompson

Direct Mapped Cache

10

Caches store Data, a Valid Bit and a unique identifier called a tag

Page 11: August 8 th , 2011 Kevan Thompson

Tags

11

As an example imagine a system with the following :

32-bit Address Bus, and 32-bit Word Size

64-KByte Cache with 32-Byte Line Size

Therefore we have 2047 (211) Lines

Page 12: August 8 th , 2011 Kevan Thompson

Set-Associated Cache

12

A cache with n possible entries for each address is called an n-way set associated cache

4-Way Set Associated Cache

Page 13: August 8 th , 2011 Kevan Thompson

Replacement Policies

13

When an entry needs to be evicted from the cache we need to decide which Way it is evicted from.

To do this we use a replacement policy

LRU

Clock

FIFO

Page 14: August 8 th , 2011 Kevan Thompson

LRU

14

Keep track of when each entry is accessed

Always evict the Least Recently Used

Implemented using a stack

MRU

LRU

Access 4 Access 2

Page 15: August 8 th , 2011 Kevan Thompson

Clock

15

For each Way we store a Reference Bit

Also store a pointed to the oldest entry (Hand)

Starting with the Hand we test and clear each R Bit until we reach one that is 0

0 1 2 3

01 1 10 0 0

Page 16: August 8 th , 2011 Kevan Thompson

System Overview

16

Page 17: August 8 th , 2011 Kevan Thompson

PolyBlaze L2 Cache

17

1-16 Way Set Associated Cache

LRU or Clock Replacement Policy

32 or 64 Byte Line Width

64 Bit Memory Interface

Write Back Cache

Page 18: August 8 th , 2011 Kevan Thompson

L2 Cache

18

Page 19: August 8 th , 2011 Kevan Thompson

Reuse Policy

19

Determines which Way is evicted on Cache Miss

Currently uses LRU Policy

Page 20: August 8 th , 2011 Kevan Thompson

Tag Bank

20

Contains Tags and Valid Bits

Stored on FPGA using BRAMs

Instantiate one bank for each Way

Page 21: August 8 th , 2011 Kevan Thompson

Control Unit

21

Finite State Machine for L2 Cache Pipelining

If a request is outstanding from NPI we can service other requests in SRAM

Page 22: August 8 th , 2011 Kevan Thompson

Data Bank

22

Control interface for off-chip SRAM

Page 23: August 8 th , 2011 Kevan Thompson

SRAM

23

32-bit ZBT synchronous SRAM

1 MB

Page 24: August 8 th , 2011 Kevan Thompson

Methodology

24

Break L2 cache into three parts and test separately then combine and test system

SRAM Controller

NPI Interface

L2 Core

Complete L2 Cache

Page 25: August 8 th , 2011 Kevan Thompson

SRAM Controller

25

Create a wrapper that connects the SRAM controller to the MicroBlaze by an FSL

Write a program that will write and read data to all addresses in the SRAM

Write all 1’s

Write all 0’s

Alternate writing all 1’s and all 0’s

Write Random data

Page 26: August 8 th , 2011 Kevan Thompson

NPI Interface

26

Uses a custom FSL width, so we cannot test using MicroBlaze

Create a hardware test bench to read and write data to all addresses

Write all 1’s

Write all 0’s

Alternate writing all 1’s and all 0’s

Write Random data

X

X

X

X

Page 27: August 8 th , 2011 Kevan Thompson

L2 Core

27

Simulate the core of the L2 cache in iSim

Write a test bench that will approximate the responses from the L1/L2 Arbiter, SRAM Controller, and NPI Interface

The test bench will write to each line multiple times to create a large number of cache misses

X

X

X

Page 28: August 8 th , 2011 Kevan Thompson

Complete L2 Cache

28

Combine the L2 Cache with the rest of PolyBlaze

Write test programs to read and write to various regions of memory

X

X

Page 29: August 8 th , 2011 Kevan Thompson

Current Progress

29

SRAM Controller and Data Bank:

Designed and Tested

NPI Interface:

Testing and Debugging in Progress

L2 Core:

Testing and Debugging in Progress

Page 30: August 8 th , 2011 Kevan Thompson

Future Work

30

Add Clock Replacement Policy to L2 Cache

Add a Write Back Buffer to L2 Cache

Migrate System from XUPV5 to a BEE3 so we can create a system with more cores

Modify the L2 Cache into a NUMA system

Add Custom Hardware Accelerators to PolyBlaze

Page 31: August 8 th , 2011 Kevan Thompson

Questions?

31