memory system support for online data-intensive...

20
www.inf.ed.ac.uk July 16, 2015 Kyungpook National University Memory System Support for Online Data-Intensive Services Boris Grot School of Informatics University of Edinburgh

Upload: others

Post on 30-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

www.inf.ed.ac.uk July 16, 2015 Kyungpook National University

Memory System Support for Online Data-Intensive Services

Boris Grot School of Informatics

University of Edinburgh

Page 2: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

The Big Data Explosion

2 Image: Erik Fitzpatrick

Page 3: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

Memory: the New Efficiency Battleground

Server CPUs getting more efficient: – Wimpy cores à low energy/op

– Many cores/chip à fewer sockets [SOP]

DRAM: –  Demand for capacity outpacing

technology scaling

– Growing contributor to datacenter Total Cost of Ownership (TCO)

3

core core core core

core core core core

core core core core

core core core core

Must innovate in the memory system

Page 4: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

DRAM 101

Accessed at block granularity -  Page activated in row buffer

•  Energy-intensive operation -  Blocks fetched from row buffer

•  Row buffer hits are 3x lower energy than activations

DRAM organized in pages -  Page consists of multiple

cache blocks •  DRAM page ≠ OS page

Do servers leverage row buffer locality?

Row Buffer

page

DRAM memory

Page 5: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

Information Stores for Big Data Pointer-intensive structures store bulk data objects

–  Constant-time data object retrievals –  Example structures: hash tables, tree structures

–  Example objects: memory-mapped files, SW objects, DB rows

hash tables (e.g., web search, object caching) trees (e.g., databases, file systems)

0 1 2 3 4 5 6 7 8 9

10

A B

C

Page 6: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

Retrieving a Bulk Object

6

Memory

A

B

C

0 1 2 3 4 5 6 7 8 9

10

Application

A B

C

Keys spread over the memory space

Bulk objects contiguously allocated

Accesses: fine-grained for key lookups & bulk for data objects

Page 7: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

Server Memory Traffic is Bimodal

Bulk accesses account for 60-75% of all memory accesses –  Bulk access: touches ≥ 50% of bytes within a 1KB region

7 Are bulk accesses leveraged by memory?

0%

25%

50%

75%

100%

Data Serving

Media Streaming

Online Analytics

Web Search

Web Serving

Mem

ory

Acc

esse

s

bulk fine-grained

Page 8: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

Bulk Accesses Are Poorly Exploited

Row buffer locality poorly exploited –  Requests from multiple cores interleave –  Limited instruction window size restricts MLP

DRAM page activations chief contributor to energy (~60%)

8

0%

25%

50%

75%

100%

Data Serving

Media Streaming

Online Analytics

Web Search

Web Serving

Mem

ory

Acc

esse

s Row buffer hits

0%

25%

50%

75%

100%

Mem

ory

Ene

rgy

Activation energy

Need to improve row buffer locality

Page 9: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

Prior Work

Memory Access Scheduling: prioritize row buffer hits –  Effectiveness limited by instruction window size

Spatial prefetching and scheduled writebacks –  High hardware cost

–  Limited opportunity: only a fraction of the memory accesses covered

9 Need a comprehensive mechanism with low cost

Page 10: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

Streaming Can Exploit Locality –  Stream contents of the row buffer to last-level cache (LLC) –  Subsequent accesses become LLC hits

10

Row Buffer

Last-Level Cache

0 1 2 3 4 5 6 7 8 9

10

Application

A B

C LLC hits

Memory Request Stream

Challenge: fine-grained accesses cause overfetch

Page 11: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

BuMP: Bulk Memory Access Prediction and Streaming

Prediction: identify bulk accesses –  For both memory reads and writes

Streaming: upon an access to bulk object –  Read: Stream entire object into the LLC – Write: Stream entire object to memory

11

[MICRO’14]

Page 12: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

BuMP: Memory Reads

Memory reads: triggered by LLC misses –  Majority (57-75%) go to pages with coarse-grained data

Prediction: associate coarse-grained access regions with code operating on them

–  Identify functions that operate on coarse-grained data •  Use the program counter (PC) of the first access

Streaming: upon a memory reference –  Check if PC belongs to a coarse-grained operation

–  Trigger bulk fetch

12 Low cost as only few PCs trigger bulk accesses

Page 13: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

BuMP: Memory Reads Prediction

13

Memory

Last-Level Cache

Page 14: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

BuMP: Memory Reads Prediction

14

Memory

History Tracking Table

Last-Level Cache

Page 15: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

BuMP: Memory Reads Prediction

15

Memory

History Tracking Table

Last-Level Cache

0 1 2 3 4 5 6 7 8 9

10

A B

C

A

B

C

A PC1

B PC1

M PC2

PC1 PC1 PC2

PC2

Page 16: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

BuMP: Memory Reads Streaming

16

Memory

History Tracking Table

Last-Level Cache

A

B

C

PC2

PC2

Row Buffer

C C

PC1

0 1 2 3 4 5 6 7 8 9

10

A B

C

Exploit row buffer locality when profitable

Page 17: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

BuMP: Memory Writes Memory writes: evictions of modified LLC blocks

–  Significant share (21-38%) of DRAM traffic –  Majority (62-86%) go to pages with coarse-grained data

Prediction: track modified LLC-resident coarse-grained data

–  Extends tracking table with a modified bit

Streaming: upon writing back an LLC block to memory –  Check if it belongs to a page with coarse-grained data –  Trigger bulk writeback

17

Page 18: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

Methodology Server applications [CloudSuite]

–  Data serving, Online analytics, Web search, Web serving, Media streaming

Many-core server –  16-core CMP @ 2.5 GHz –  16 GB of DRAM

18

Performance evaluation –  Simics: full-system

simulation –  Flexus: cycle-accurate

models of CMP & DRAM

Energy consumption –  Custom DRAM energy

models based on Micron

Page 19: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

Evaluation Highlights

–  BuMP reduces row activations by 2x over Base-Open –  Small over-fetch rate of ~12%

–  Improves performance by ~11% over Base-Open

19

0%

25%

50%

75%

100%

Base

Bu

MP

Base

Bu

MP

Base

Bu

MP

Base

Bu

MP

Base

Bu

MP

Data Serving

Media Streaming

Online Analytics

Web Search

Web Serving

Mem

ory

Ene

rgy

Row activations Row buffer hits & Interface

Improves memory energy per access by 23%

Page 20: Memory System Support for Online Data-Intensive Servicesbkict-ocw.knu.ac.kr/include/download.html?fn=55C44E85C53BF.pdf · Information Stores for Big Data! Pointer-intensive structures

BuMP: Summary Servers access memory in two granularities

–  Fine: pointer-intensive data structures –  Coarse: bulk data objects

DRAM does not exploit coarse-grained accesses –  Accesses to different objects are interleaved

BuMP improves server energy efficiency –  Identifies bulk accesses & triggers bulk transfers –  Improves memory energy per access by 23%

20

0 1 2 3 4 5 6 7 8 9

10

A B

C