gennady pekhimenko , vivek seshadri , yoongu kim, hongyi xin , onur mutlu , todd c. mowry
DESCRIPTION
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency . Phillip B. Gibbons, Michael A. Kozuch. Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry. Executive Summary. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/1.jpg)
Linearly Compressed Pages: A Main Memory
Compression Framework with Low Complexity and Low Latency
Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry
Phillip B. Gibbons, Michael A. Kozuch
![Page 2: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/2.jpg)
2
Executive Summary Main memory is a limited shared resource Observation: Significant data redundancy Idea: Compress data in main memory Problem: How to avoid inefficiency in address
computation? Solution: Linearly Compressed Pages (LCP): fixed-size cache line granularity compression1. Increases memory capacity (62% on average)2. Decreases memory bandwidth consumption (24%)3. Decreases memory energy consumption (4.9%)4. Improves overall performance (13.9%)
![Page 3: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/3.jpg)
3
Potential for Data CompressionSignificant redundancy in in-memory data:
0x00000000
How can we exploit this redundancy?• Main memory compression helps• Provides effect of a larger memory without
making it physically larger
0x0000000B 0x00000003 0x00000004 …
![Page 4: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/4.jpg)
4
Challenges in Main Memory Compression
1. Address Computation
2. Mapping and Fragmentation
![Page 5: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/5.jpg)
L0 L1 L2 . . . LN-1
Cache Line (64B)
Address Offset 0 64 128 (N-1)*64
L0 L1 L2 . . . LN-1Compressed Page
0 ? ? ?Address Offset
Uncompressed Page
Challenge 1: Address Computation
5
![Page 6: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/6.jpg)
Challenge 2: Mapping & Fragmentation
6
Virtual Page (4KB)
Physical Page (? KB) Fragmentation
Virtual Address
Physical Address
![Page 7: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/7.jpg)
7
Outline• Motivation & Challenges• Shortcomings of Prior Work• LCP: Key Idea • LCP: Implementation• Evaluation• Conclusion and Future Work
![Page 8: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/8.jpg)
8
Key Parameters in Memory CompressionCompressionRatio
Address Comp.Latency
Decompression Latency
Complexityand Cost
![Page 9: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/9.jpg)
9
Shortcomings of Prior WorkCompressionMechanisms
CompressionRatio
Address Comp.Latency
Decompression Latency
Complexityand Cost
IBM MXT[IBM J.R.D. ’01]
2X 64 cycles
![Page 10: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/10.jpg)
10
Shortcomings of Prior Work (2)CompressionMechanisms
CompressionRatio
Address Comp.Latency
Decompression Latency
Complexity And Cost
IBM MXT[IBM J.R.D. ’01] Robust Main Memory Compression [ISCA’05]
![Page 11: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/11.jpg)
11
Shortcomings of Prior Work (3)CompressionMechanisms
CompressionRatio
Address Comp.Latency
Decompression Latency
Complexity And Cost
IBM MXT[IBM J.R.D. ’01] Robust Main Memory Compression [ISCA’05]
LCP: Our Proposal
![Page 12: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/12.jpg)
12
Linearly Compressed Pages (LCP): Key Idea
64B 64B 64B 64B . . .
. . .4:1 Compression
64B
Uncompressed Page (4KB: 64*64B)
Compressed Data (1KB)
LCP effectively solves challenge 1: address computation
128
32
Fixed compressed size
![Page 13: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/13.jpg)
13
4:1 Compression
E
LCP: Key Idea (2)
64B 64B 64B 64B . . .
. . . M
Metadata (64B)
ExceptionStorage
64B
Uncompressed Page (4KB: 64*64B)
Compressed Data (1KB)
idx
E0
![Page 14: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/14.jpg)
14
E
But, wait …
64B 64B 64B 64B . . .
. . . M
4:1 Compression
64B
Uncompressed Page (4KB: 64*64B)
Compressed Data (1KB)
How to avoid 2 accesses ?
Metadata (MD) cache
![Page 15: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/15.jpg)
15
Key Ideas: Summary
Fixed compressed size per cache line
Metadata (MD) cache
![Page 16: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/16.jpg)
16
Outline• Motivation & Challenges• Shortcomings of Prior Work• LCP: Key Idea • LCP: Implementation• Evaluation• Conclusion and Future Work
![Page 17: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/17.jpg)
17
LCP Overview• Page Table entry extension
• compression type and size (fixed encoding)• OS support for multiple page sizes
• 4 memory pools (512B, 1KB, 2KB, 4KB)• Handling uncompressible data• Hardware support
• memory controller logic• metadata (MD) cache
PTE
512B 1KB 2KB 4KB
![Page 18: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/18.jpg)
Page Table Entry Extension
18
c-bit (1b)c-type (3b)
Page Table Entry c-size (2b)
c-base (3b)• c-bit (1b) – compressed or uncompressed page• c-type (3b) – compression encoding used• c-size (2b) – LCP size (e.g., 1KB)• c-base (3b) – offset within a page
![Page 19: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/19.jpg)
19
Physical Memory Layout
1
4
4KB
2KB 2KB
1KB 1KB 1KB 1KB
512B 512B ... 512B
4KB
…
…
Page Table
PA1
PA2
…
PA2 + 512*1
PA1 + 512*4
PA0
![Page 20: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/20.jpg)
20
Memory Request Flow
1. Initial Page Compression
2. Cache Line Read
3. Cache Line Writeback
![Page 21: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/21.jpg)
21
Initial Page Compression (1/3)Memory Request Flow (2)
Last-LevelCache
Core TLB
Compress/ Decompress
MemoryController
MD Cache
Processor
Disk
DRAM
4KB
1KB
1. Initial Page Compression2. Cache Line Read
LD
LD
1KB$Line
3. Cache Line Writeback
$Line
2KB
$Line
Cache Line Read (2/3)Cache Line Writeback (3/3)
![Page 22: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/22.jpg)
22
Handling Page Overflows• Happens after writebacks, when all slots in the
exception storage are already taken
• Two possible scenarios:• Type-1 overflow: requires larger physical page size
(e.g., 2KB instead of 1KB)• Type-2 overflow: requires decompression and full
uncompressed physical page (e.g., 4KB)
…
$ line
M
Compressed Data
E0
Exception Storage
E1 E2
Happens infrequently -once per ~2M instructions
![Page 23: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/23.jpg)
23
Compression Algorithms• Key requirements:
• Low hardware complexity• Low decompression latency• High effective compression ratio
• Frequent Pattern Compression [ISCA’04]
• Uses simplified dictionary-based compression
• Base-Delta-Immediate Compression [PACT’12]
• Uses low-dynamic range in the data
![Page 24: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/24.jpg)
24
Base-Delta Encoding [PACT’12]32-byte Uncompressed Cache Line
0xC04039C0 0xC04039C8 0xC04039D0 … 0xC04039F8
0xC04039C0Base
0x00
1 byte
0x08
1 byte
0x10
1 byte
… 0x38 12-byte Compressed Cache Line
20 bytes saved Fast Decompression: vector addition
Simple Hardware: arithmetic and comparison
Effective: good compression ratio
BDI [PACT’12] has two bases:1. zero base (for narrow values)2. arbitrary base (first non-zero
value in the cache line)
![Page 25: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/25.jpg)
25
• Memory bandwidth reduction:
• Zero pages and zero cache lines• Handled separately in TLB (1-bit) and in metadata (1-bit per cache line)
LCP-Enabled Optimizations
64B 64B 64B 64B
1 transfer instead of 4
![Page 26: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/26.jpg)
26
Outline• Motivation & Challenges• Shortcomings of Prior Work• LCP: Key Idea • LCP: Implementation• Evaluation• Conclusion and Future Work
![Page 27: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/27.jpg)
27
Methodology• Simulator: x86 event-driven based on Simics
• Workloads (32 applications)• SPEC2006 benchmarks, TPC, Apache web server
• System Parameters• L1/L2/L3 cache latencies from CACTI [Thoziyoor+, ISCA’08]• 512kB - 16MB L2 caches • DDR3-1066, 1 memory channel
• Metrics• Performance: Instructions per cycle, weighted speedup• Capacity: Effective compression ratio• Bandwidth: Bytes per kilo-instruction (BPKI)• Energy: Memory subsystem energy
![Page 28: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/28.jpg)
28
Evaluated DesignsDesign Description
Baseline Baseline (no compression)
RMC Robust main memory compression[ISCA’05](RMC) and FPC[ISCA’04]
LCP-FPC LCP framework with FPC
LCP-BDI LCP framework with BDI[PACT’12]
LZ Lempel-Ziv compression (per page)
![Page 29: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/29.jpg)
29
Effect on Memory Capacity32 SPEC2006, databases, web workloads, 2MB L2 cache
LCP-based designs achieve competitive average compression ratios with prior work
0.00.51.01.52.02.5
1.00
1.59 1.52 1.62
2.60Baseline RMC LCP-FPC LCP-BDI LZ
Com
pres
sion
Ra
tio
![Page 30: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/30.jpg)
30
Effect on Bus Bandwidth32 SPEC2006, databases, web workloads, 2MB L2 cache
LCP-based designs significantly reduce bandwidth (24%)(due to data compression)
Bett
er
0.00.20.40.60.81.0 1.00
0.79 0.80 0.76
Baseline RMC LCP-FPC LCP-BDI
Norm
alize
d BP
KI
![Page 31: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/31.jpg)
31
Effect on Performance
LCP-based designs significantly improve performance over RMC
1-core 2-core 4-core0%2%4%6%8%
10%12%14%16%
RMC LCP-FPC LCP-BDI
Perf
orm
ance
Im
prov
emen
t
![Page 32: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/32.jpg)
32
Effect on Memory Subsystem Energy32 SPEC2006, databases, web workloads, 2MB L2 cache
LCP framework is more energy efficient than RMC
Bette
r
0.00.20.40.60.81.01.2
1.00 1.06 0.97 0.95
Baseline RMC LCP-FPC LCP-BDI
Norm
alize
d En
ergy
![Page 33: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/33.jpg)
33
Effect on Page Faults32 SPEC2006, databases, web workloads, 2MB L2 cache
LCP framework significantly decreases the number of page faults (up to 23% on average for 768MB)
256MB 512MB 768MB 1GB0
0.20.40.60.8
11.2
8%14% 23%
21%
Baseline LCP-BDI
DRAM Size
Norm
alize
d #
of
Page
Fau
lts
![Page 34: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/34.jpg)
34
Other Results and Analyses in the Paper
• Analysis of page overflows• Compressed page size distribution• Compression ratio over time• Number of exceptions (per page)• Detailed single-/multicore evaluation• Comparison with stride prefetching
• performance and bandwidth
![Page 35: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/35.jpg)
35
Conclusion• Old Idea: Compress data in main memory• Problem: How to avoid inefficiency in address
computation?• Solution: A new main memory compression framework
called LCP (Linearly Compressed Pages)• Key idea: fixed-size for compressed cache lines within a page
• Evaluation:1. Increases memory capacity (62% on average)2. Decreases bandwidth consumption (24%)3. Decreases memory energy consumption (4.9%)4. Improves overall performance (13.9%)
![Page 36: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/36.jpg)
Linearly Compressed Pages: A Main Memory Compression
Framework with Low Complexity and Low Latency
Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry
Phillip B. Gibbons, Michael A. Kozuch
![Page 37: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/37.jpg)
37
Backup Slides
![Page 38: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/38.jpg)
38
Large Pages (e.g., 2MB or 1GB)
• Splitting large pages into smaller 4KB sub-pages (compressed individually)
• 64-byte metadata chunks for every sub-page
2KB 2KB
…
2KB 2KBM
![Page 39: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/39.jpg)
Physically Tagged Caches
39
Core
TLB
tagtagtag
Physical Address
datadatadata
VirtualAddress
Critical PathAddress Translation
L2 CacheLines
![Page 40: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/40.jpg)
40
Changes to Cache Tagging Logic
Before:
tagtag
p-base
datadatadata
CacheLines
tag
• p-base – physical page base address• c-idx – cache line index within the page
After:p-base c-idx
![Page 41: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/41.jpg)
41
Analysis of Page Overflows
apache
bzip2
gcc
grom
acs
lbm
libqu
antum
omne
tpp
sjen
g
sphinx3
tpch6
zeusmp
Geo
Mea
n
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03Ty
pe-1
Ove
rflo
ws p
er in
str.
(lo
g-sc
ale)
![Page 42: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/42.jpg)
42
Frequent Pattern CompressionIdea: encode cache lines based on frequently occurring patterns, e.g., first half of a word is zero
0x00000001 0x00000000 0xFFFFFFFF 0xABCDEFFF
0x00000001 001
0x00000000 000
0xFFFFFFFF 011
0xABCDEFFF 111
Frequent Patterns:000 – All zeros001 – First half zeros010 – Second half zeros011 – Repeated bytes100 – All ones…111 – Not a frequent pattern
001 0x0001 000 011 0xFF 111 0xABCDEFFF
0x0001
0xFF
0xABCDEFFF
![Page 43: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/43.jpg)
43
GPGPU Evaluation
• Gpgpu-sim v3.x• Card: NVIDIA GeForce GTX 480 (Fermi)• Caches:
– DL1: 16 KB with 128B lines– L2: 786 KB with 128B lines
• Memory: GDDR5
![Page 44: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/44.jpg)
44
Effect on Bandwidth ConsumptionBF
SM
UM JPEG NN LP
SST
OCO
NS SCP
spm
vsa
d
back
prop
hots
pot
stre
amcl
uste
r
PVC
PVR
InvI
dx SS bfs
bh dmr
mst sp
sssp
GeoM
ean
CUDA Parboil Rodinia Mars Lonestar
0.00.51.01.52.02.53.0
BDI LCP-BDI
Nor
mal
ized
BPK
I
![Page 45: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/45.jpg)
45
Effect on ThroughputBF
SM
UM JPEG NN LP
SST
OCO
NS SCP
spm
vsa
d
back
prop
hots
pot
stre
amcl
uste
r
PVC
PVR
InvI
dx SS bfs
bh dmr
mst sp
sssp
GeoM
ean
CUDA Parboil Rodinia Mars Lonestar
0.81.01.21.41.61.8
Baseline BDI
Nor
mal
ized
Per
form
ance
![Page 46: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/46.jpg)
46
Physical Memory Layout
1
4
4KB
2KB 2KB
1KB 1KB 1KB 1KB
512B 512B ... 512B
4KB
…
…
Page Table
PA1 c-basePA2
…
PA2 + 512*1
PA1 + 512*4
PA0
![Page 47: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/47.jpg)
47
Page Size Distribution
![Page 48: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/48.jpg)
48
Compression Ratio Over Time
![Page 49: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/49.jpg)
49
IPC (1-core)
![Page 50: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/50.jpg)
50
Weighted Speedup
![Page 51: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/51.jpg)
51
Bandwidth Consumption
![Page 52: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/52.jpg)
52
Page Overflows
![Page 53: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/53.jpg)
53
Stride Prefetching - IPC
![Page 54: Gennady Pekhimenko , Vivek Seshadri , Yoongu Kim, Hongyi Xin , Onur Mutlu , Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062501/568166e6550346895ddb244f/html5/thumbnails/54.jpg)
54
Stride Prefetching - Bandwidth