understanding a problem in multicore and how to solve it from onur mutlu carnegie mellon university

11
Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

Upload: emery-mathews

Post on 16-Dec-2015

212 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

Understanding a Problem in Multicore and How to Solve It

From Onur MutluCarnegie Mellon University

Page 2: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

An Example: Multi-Core Systems

2

CORE 1

L2 C

AC

HE

0

SH

AR

ED

L3 C

AC

HE

DR

AM

INT

ER

FA

CE

CORE 0

CORE 2 CORE 3L

2 CA

CH

E 1

L2 C

AC

HE

2

L2 C

AC

HE

3

DR

AM

BA

NK

S

Multi-CoreChip

*Die photo credit: AMD Barcelona

DRAM MEMORY CONTROLLER

Page 3: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

Unexpected Slowdowns in Multi-Core

3

Memory Performance HogLow priority

High priority

(Core 0) (Core 1)

Moscibroda and Mutlu, “Memory performance attacks: Denial of memory service in multi-core systems,” USENIX Security 2007.

Page 4: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

A Question or Two Can you figure out why there is a disparity in

slowdowns if you do not know how the processor executes the programs?

Can you fix the problem without knowing what is happening “underneath”?

4

Page 5: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

5

Why the Disparity in Slowdowns?

CORE 1 CORE 2

L2 CACHE

L2 CACHE

DRAM MEMORY CONTROLLER

DRAM

Bank 0

DRAM

Bank 1

DRAM

Bank 2

Shared DRAMMemory System

Multi-CoreChip

unfairnessINTERCONNECT

matlab gcc

DRAM

Bank 3

Page 6: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

DRAM Bank Operation

6

Row Buffer

(Row 0, Column 0)

Row

dec

oder

Column mux

Row address 0

Column address 0

Data

Row 0Empty

(Row 0, Column 1)

Column address 1

(Row 0, Column 85)

Column address 85

(Row 1, Column 0)

HITHIT

Row address 1

Row 1

Column address 0

CONFLICT !

Columns

Row

s

Access Address:

Page 7: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

7

DRAM Controllers

A row-conflict memory access takes significantly longer than a row-hit access

Current controllers take advantage of the row buffer

Commonly used scheduling policy (FR-FCFS) [Rixner 2000]*

(1) Row-hit first: Service row-hit memory accesses first(2) Oldest-first: Then service older accesses first

This scheduling policy aims to maximize DRAM throughput

*Rixner et al., “Memory Access Scheduling,” ISCA 2000.*Zuravleff and Robinson, “Controller for a synchronous DRAM …,” US Patent 5,630,096, May 1997.

Page 8: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

8

The Problem Multiple threads share the DRAM controller DRAM controllers designed to maximize DRAM

throughput

DRAM scheduling policies are thread-unfair Row-hit first: unfairly prioritizes threads with high row buffer

locality Threads that keep on accessing the same row

Oldest-first: unfairly prioritizes memory-intensive threads

DRAM controller vulnerable to denial of service attacks Can write programs to exploit unfairness

Page 9: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

// initialize large arrays A, B

for (j=0; j<N; j++) { index = rand(); A[index] = B[index]; …}

9

A Memory Performance Hog

STREAM

- Sequential memory access - Very high row buffer locality (96% hit rate)- Memory intensive

RANDOM

- Random memory access- Very low row buffer locality (3% hit rate)- Similarly memory intensive

// initialize large arrays A, B

for (j=0; j<N; j++) { index = j*linesize; A[index] = B[index]; …}

streaming random

Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007.

Page 10: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

10

What Does the Memory Hog Do?

Row Buffer

Row

dec

oder

Column mux

Data

Row 0

T0: Row 0

Row 0

T1: Row 16

T0: Row 0T1: Row 111

T0: Row 0T0: Row 0T1: Row 5

T0: Row 0T0: Row 0T0: Row 0T0: Row 0T0: Row 0

Memory Request Buffer

T0: STREAMT1: RANDOM

Row size: 8KB, cache block size: 64B128 (8KB/64B) requests of T0 serviced before T1

Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007.

Page 11: Understanding a Problem in Multicore and How to Solve It From Onur Mutlu Carnegie Mellon University

Now That We Know What Happens Underneath How would you solve the problem?

What is the right place to solve the problem? Programmer? System software? Compiler? Hardware (Memory controller)? Hardware (DRAM)? Circuits?

Two other goals of this course: Make you think critically Make you think broadly

11

Microarchitecture

ISA (Architecture)

Program/Language

Algorithm

Problem

Logic

Circuits

Runtime System(VM, OS, MM)

Electrons