modern dram memory systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 ·...
TRANSCRIPT
![Page 1: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/1.jpg)
Brian T. Davispage 1
Modern DRAM Memory Systems
Brian T. Davis
MTU Interview Seminar
Advanced Computer Architecture LaboratoryUniversity of Michigan
April 24, 2000
![Page 2: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/2.jpg)
Brian T. Davispage 2
● Introduction❍ Memory system❍ Research objective
● DRAM Primer❍ Array❍ Access sequence❍ SDRAM❍ Motivation for further innovation
● Modern DRAM Architectures❍ DRDRAM❍ DDR2❍ Cache enhanced DDR2 low-latency variants
● Performance and Controller Policy Research❍ Simulation methodologies❍ Results
● Conclusions● Future Work
![Page 3: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/3.jpg)
Brian T. Davispage 3
Processor Memory System
● Architecture Overview❍ This is the architecture of most desktop systems❍ Cache configurations may vary❍ DRAM Controller is typically an element of the chipset❍ Speed of all Busses can vary depending upon the system
● DRAM Latency Problem
CPU
PrimaryCache
SecondaryCache
Backside Bus
North-BridgeChipset
DRAMController
Frontside Bus
DRAMSystem
DRAM Bus
Other Chipset DevicesI/O Systems
![Page 4: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/4.jpg)
Brian T. Davispage 4
Research Objective
● Determine highest performance memory controller policy foreach DRAM architecture
● Compare performance of various DRAM architectures fordifferent classifications of applications, while each architecture isoperating under best controller policy
![Page 5: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/5.jpg)
Brian T. Davispage 5
DRAM Array
❍ One transistor & capacitor per bit in the DRAM (256 or 512MBit currently)❍ Three events in hardware access sequence
● Precharge● Energize word line--based upon de-muxed row data● Select bits from the row in sense-amps
❍ Refresh is mandatory❍ Page and row are synonymous terminology
Word Lines
. . . . . . .
. . .
. . .
.B
it Li
nes
Sense Amplifiers
Column Decoder
Row
Dec
oder
![Page 6: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/6.jpg)
Brian T. Davispage 6
Arrays per Device
● Multiple arrays per device & aspect ratio❍ Larger arrays; larger bit lines; higher capacitance; higher latency❍ Multiple smaller arrays; lower latency; more concurrency (if interface allows)❍ Tradeoff--fewer & larger = cheaper--more & smaller = higher performance
● Controller policies❍ Close-Page-AutoPrecharge (CPA)❍ Open-Page (OP)
![Page 7: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/7.jpg)
Brian T. Davispage 7
Fast-Page-Mode (FPM) DRAM Interface
❍ All signals required by DRAM array provided by DRAM controller
❍ Three events in FPM interface access sequence● Row Address Strobe - RAS● Column Address Strobe - CAS● Data response
❍ Dedicated interface - only a single transaction at any time
❍ Address bus multiplexed between row & column
RAS
CAS
Address
Data
Row Col 1 Col 2 Col 3
Data 1 Data 2 Data 3
![Page 8: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/8.jpg)
Brian T. Davispage 8
SDRAM Interface
❍ All I/O synchronous rather than async--buffered on the device❍ Split-transaction interface❍ Allows concurrency in a pipelined-similar fashion - to unique banks❍ Requires latches for address & data - low device overhead❍ Double Data Rate (DDR) increases only data transition frequency
![Page 9: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/9.jpg)
Brian T. Davispage 9
SDRAM DIMM/System Architecture
❍ Devices per DIMM affects effective page size thus potentially performance❍ Each device only covers a "slice" of the data bus❍ DIMMs can be single or double sided - single sided shown❍ Data I/O per device is a bond-out issue
● Has been increasing as devices get larger
DIMM
Add
r 8 888 8 8 8 8
Dat
a 64
168-PIN SDRAM DIMM Interface
Add
ition
al D
IMM
s
DRAMController
![Page 10: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/10.jpg)
Brian T. Davispage 10
Motivation for a New DRAM Architecture● SDRAM limits performance of high-performance processors
❍ TPC-C 4-wide issue machines achieve CPI of 4.2-4.5 (DEC)❍ STREAM 8-wide machine--1Ghz: CPI of 3.6-9.7--5G: CPI of 7.7-42.0❍ PERL 8-wide machine--1Ghz: CPI of 0.8-1.1--5Ghz: CPI of 1.0-4.7
● DRAM array has essentially remained static for 25 years❍ Device size (x4) per 3 years - Moore’s law❍ Processors performance (not speed) 60% annually❍ Latency decreases at 7% annually
● Bandwidth vs. Latency❍ Potential bandwidth = (data bus width) * (operating frequency)❍ 64-bit desktop bus 100-133 MHz (0.8 - 1.064 GB/s)❍ 256-bit server (parity) bus 83-100 Mhz (2.666-3.2 GB/s)
● Workstation manufacturers migrating to enhanced DRAM
![Page 11: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/11.jpg)
Brian T. Davispage 11
Modern DRAM Architectures● DRAM architecture’s examined
❍ PC100 - baseline SDRAM❍ DDR133(PC2100) - SDRAM 9 months out❍ Rambus -> Concurrent Rambus -> Direct Rambus❍ DDR2❍ Cache Enhanced Architecture - possible to any interface - here to DDR2
● Not all novel DRAM will be discussed here❍ SyncLink - death by standards organization❍ Cached DRAM - two-port notebook single-solution❍ MultiBanked DRAM - low-latency core w/ many small banks
● Common elements❍ Interface should enable parallelism between accesses to unique banks❍ Exploit the extra bits retrieved, but not requested
● Focus on DDR2 low-latency variants❍ JEDEC 42.3 Future DRAM Task Group❍ Low-Latency DRAM Working Group
![Page 12: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/12.jpg)
Brian T. Davispage 12
DRDRAM RIMM/System Architecture
❍ Smaller arrays: 32 per 128Mbit device (4 Mbit Arrays; 1KByte page)❍ Devices in series on RIMM rather than parallel❍ Many more banks than in an equivalent size SDRAM memory system❍ Sense-amps are shared between neighboring banks❍ Clock flows both directions along channel
![Page 13: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/13.jpg)
Brian T. Davispage 13
Direct Rambus (DRDRAM) Channel
❍ Narrow bus architecture❍ All activity occurs in OCTCYCLES (4 clock cycles; 8 signal transitions)❍ Three bus components
● Row (3 bits); Col (5 bits); Data (16 bits)❍ Allows 3 transactions to use the bus concurrently❍ All signals are Double Data Rate (DDR)
![Page 14: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/14.jpg)
Brian T. Davispage 14
DDR2 Architecture
❍ Four arrays per 512 Mbit device❍ Simulations assume 4 (x16) devices per DIMM❍ Few, large arrays--64MByte effective banks--8 KByte effective pages
![Page 15: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/15.jpg)
Brian T. Davispage 15
DDR2 Interface
❍ Changes from current SDRAM interface● Additive Latency (AL = 2; CL = 3 in this figure)● Fixed burst size of 4● Reduce power considerations
❍ Leverages existing knowledge
![Page 16: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/16.jpg)
Brian T. Davispage 16
EMS Cache-Enhanced Architecture
❍ Full SRAM cache array for each row❍ Precharge latency can always be hidden❍ Adds the capacity for No-Write-Transfer❍ Controller requires no additional storage--only control for NW-Xfer
![Page 17: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/17.jpg)
Brian T. Davispage 17
Virtual Channel Architecture
❍ Channels are SRAM cache on DRAM die - 16 channels = 16 line cache❍ Read and write can only occur through channel❍ Controller can manage channels in many ways
● FIFO● Bus-master based
❍ Controller complexity & storage increase dramatically❍ Designed to reduce conflict misses
![Page 18: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/18.jpg)
Brian T. Davispage 18
PC133 DDR2 DDR2_VC DDR2_EMS DRDRAM
PotentialBandwidth
1.064 GB/s 3.2 GB/s 1.6 GB/s
Interface • Bus• 64 Data bits• 168 pads on
DIMM• 133 Mhz
• Bus• 64 Data bits• 184 pads on
DIMM• 200 Mhz
• Channel• 16 Data Bits• 184 pads on
RIMM• 400 Mhz
Latencyto first 64 bits(Min. : Max)
(3 : 9) cycles
(22.5 : 66.7) nS
(3.5 : 9.5) cycles
(17.5 : 47.5) nS
(2.5 : 18.5) cycles
(12.5 : 92.5) nS
(3.5 : 9.5) cycles
(17.5 : 47.5) nS
(14 : 32) cycles
(35 : 80) nS
LatencyAdvantage
• 16 Line Cache /Dev; 1/4 row linesize
• Cache Line perbank; line size isrow size
• Many smallerbanks
• More open pages
Advantage • Cost • Cost • Less Misses in“Hot Bank”
• PrechargeAlways Hidden
• Full Array BWUtilized
• Narrow Bus• Smaller
Incrementalgranularity
Disadvantage • Area (3-6%)• Controller
Complexity• More misses on
purely linearaccesses
• Area (5-8%)• More conflict
misses
• Area (10%)• Sense Amps
shared betweenadjacent banks
![Page 19: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/19.jpg)
Brian T. Davispage 19
Comparison of Controller Policies● Close-Page Auto Precharge (CPA)
❍ After each access, data in sense-amps is discarded❍ ADV: Subsequent accesses in unique row/page: no precharge latency❍ DIS: Subsequent accesses in same row/page: must repeat access
● Open-Page (OP)❍ After each access, data in sense-amps is maintained❍ ADV: subsequent accesses in same row/page: page-mode access❍ DIS: Adjacent accesses in unique row/page: incurs precharge latency
● EMS considerations❍ No-Write Transfer mode - how to identify write only streams or rows
● Virtual Channel (VC) considerations❍ How many channels can the controller manage?❍ Dirty virtual channel writeback
![Page 20: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/20.jpg)
Brian T. Davispage 20
Execution Driven Simulation
❍ SimpleScalar - standard processor simulation tool❍ Advantages
● Feedback from DRAM latency● Parameter’s of system are easy modify with full reliability● Confidence in results can be very high
❍ Disadvantages● SLOW to execute● Limited to architectures which can be simulated by SimpleScalar
SecondaryCache
Backside Bus
North-BridgeChipset
DRAMController
Frontside Bus
DRAMSystem
DRAM Bus
Other Chipset DevicesI/O Systems
CompiledBinaries
CPU
PrimaryCache
SimpleScalar
Not Modeled
![Page 21: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/21.jpg)
Brian T. Davispage 21
Trace Driven Simulation
❍ Advantages● FAST to simulate● Allows traces from SMP’s or more complex architectures● Appropriate for model verification, hit-rate
❍ Disadvantages● No-feedback from access to subsequent accesses● W/O timestamps is essentially a limit-study framework● Not appropriate for time based results● Simulation parameters limited to those of the gathered system
North-BridgeChipset
DRAMController
Frontside Bus
DRAMSystem
DRAM Bus
Other Chipset DevicesI/O Systems
FrontSideBus Level
&Graphics & I/O
Accesses
![Page 22: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/22.jpg)
Brian T. Davispage 22
Results● Execution driven based upon:
❍ SimpleScalar● Version 2.0MSHR - Written by Todd Austin - Modified by Doug Burger -
Customized for these simulations● 8 Way Super Scalar / 2 Memory Ports● 32K I/D split L1 caches● 256K Unified L2● 16 MSHRs provide concurrent memory access support
● Trace driven based upon:❍ IBM OLTP (On-Line Transaction Processing) traces
● SMP 1-way or 8-way processor - elements are cache snoop data❍ Transmeta Crusoe processor running Windows applications
● Includes processor, AGP graphics & I/O as access sources
● DRAM & controller models❍ SDRAM model (PC100 - DDR133)❍ DRDRAM model❍ DDR2 model (std, vc & ems)
![Page 23: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/23.jpg)
Brian T. Davispage 23
cc1 com−−press
go ijpeg li linear_walk
mpeg2dec
mpeg2enc
pegwit perl00
0.025
0.05
0.075
0.1
0.125
0.15
0.175
0.2
0.225
0.25
SPEC BMarks Runtime
pc100
ddr133
drd
ddr2
ddr2ems
ddr2vc
Benchmark
Sec
onds
![Page 24: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/24.jpg)
Brian T. Davispage 24
random_walk stream stream_no_unroll00
0.25
0.5
0.75
11
1.25
1.5
1.75
Bandwidth Benchmarks Runtimepc100
ddr133
drd
ddr2
ddr2ems
ddr2vc
Benchmark
Sec
onds
![Page 25: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/25.jpg)
Brian T. Davispage 25
10g 5g 1g00
0.25
0.5
0.75
11
1.25
1.5
1.75
22
Stream Execution Timepc100
ddr133
drd
ddr2
ddr2ems
ddr2vc
Processor Speed
Sec
onds
![Page 26: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/26.jpg)
Brian T. Davispage 26
cc1 com−−press
go ijpeg li linear_walk
mpeg2dec
mpeg2enc
pegwit perl ran−−dom_walk
streamm
stream_no_un−−
00
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Data Bus Utilization
pc100
ddr133
ddr2
ddr2ems
ddr2vc
Benchmark
Fra
ctio
n
![Page 27: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/27.jpg)
Brian T. Davispage 27
oltp1w oltp8w xm_access xm_cpumark xm_gcc xm_quake00
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
11
Adjacent Accesses to Same Bank ddr2_cpa
ddr2_cpa_remap
ddr2_op
ddr2_op_remap
Ddr2ems
ddr2ems_remap
Ddr2vc
ddr2vc_remap
Trace
Fra
ctio
n
![Page 28: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/28.jpg)
Brian T. Davispage 28
oltp1w oltp8w xm_access xm_cpumark xm_gcc xm_quake00
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Hit Rates ddr2_cpa
ddr2_op
ddr2ems_cpa
ddr2vc_cpa
Trace
Fra
ctio
n
![Page 29: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/29.jpg)
Brian T. Davispage 29
ddr2_cpa ddr2_op ddr2ems ddr2vc00
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
11
Adjacent Accesses − Remapping Effectiveness
DRAM Architecture
Fra
ctio
n
![Page 30: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/30.jpg)
Brian T. Davispage 30
cc1 com−−press
go ijpeg li linear_walk
mpeg2dec
mpeg2enc
pegwit perl ran−−dom_walk
stream stream_no_un−−roll
00
55
10
15
20
25
30
35
40
Average Latencyddr2_cpa
ddr2_cpa_inv
ddr2_op
ddr2ems_cpa
ddr2ems_cpa_inv
ddr2ems_op
ddr2vc_cpa
Benchmark
Nan
oSec
onds
![Page 31: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/31.jpg)
Brian T. Davispage 31
Conclusions
● More bandwidth can be had, at a cost
● The target for architectural improvements must be latency
● Controller can significantly affect average latency
● DDR2 is evolutionary, but provides the required performance
● Cache Enhanced DRAM can improve performance, but the pricefor improvement is dependant upon market penetration
● Packetized interfaces incurs increased latency
![Page 32: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/32.jpg)
Brian T. Davispage 32
Future Work
● VC controller performance❍ Cache line allocation policy(s) for Channels❍ When to write-back dirty channels - avoid maximal penalty❍ Price/Performance in Controller
● EMS controller performance❍ when to use no-write-transfer
● Controller onto processor die
● Embedded DRAM architectures
● SMP primary memory partitioning
![Page 33: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/33.jpg)
Brian T. Davispage 33
Conventional DRAM
❍ Basic DRAM core (memory array) used in all DRAM memories❍ Delay is propagation through all circuits, no pipelining❍ Limit on memory array size due to bit line capacitance❍ Remainder of row, accessed, but not used, is discarded
....Bit Lines...
MemoryArray
Sense Amps/Word Drivers
Row
Dec
oder
.
.
.
.
.
.
Data In/OutBuffers
Column DecoderClock &Refresh Cktry
Column Buffer
Row Buffer
Data
rd/wr
ras
cas
address
![Page 34: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/34.jpg)
Brian T. Davispage 34
Conventional DRAM Upgrades
● Fast Page Mode (FPM) DRAM❍ Eliminates the RAS transition requirement between each access❍ Utilizes the sense-amp contents as cache
● Extended Data Out (EDO) DRAM❍ Latch added between the sense-amps and the output drivers❍ Allows parallel operation of two DRAM components
● Output drivers function while next access is being done● Memory array (precharge or access) is somewhat overlapped
● Burst EDO DRAM❍ Burst capability for accessing large contiguous segments of a row❍ Toggling of the CAS line sequences to the next datum in the burst
![Page 35: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/35.jpg)
Brian T. Davispage 35
Conventional (FPM) DRAM Interface
❍ Dedicated interface - only a single transaction at any time❍ Address bus multiplexed between Row & Column❍ All signals to req’d by DRAM array provided by DRAM controller
![Page 36: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/36.jpg)
Brian T. Davispage 36
Synchronous DRAM (SDRAM)
❍ Make all I/O synchronous rather than async❍ 66MHz SDRAM -> PC100 -> DDR133 (PC2100)❍ Overhead is very low - latches for address & data
....Bit Lines...
DRAMArray
Sense Amps/Word Drivers
Row
Dec
oder
.
.
.
.
.
.
I/O Buffers
Column Decoder
ControlSignal
Address Buffer
Data
ras
cas
address
chip sel
Clock
Generator
Read Register
Write Register
![Page 37: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/37.jpg)
Brian T. Davispage 37
Interleaved Memory
❍ Relatively uncommon❍ Used to get concurrency from asynchronous DRAM
Bank 0 Bank 1 Bank N-1 Bank N....................
Data
Individual
Address
ControlSignals
![Page 38: Modern DRAM Memory Systemsmercury.pr.erau.edu/~davisb22/papers/mem_wall.isca2k... · 2000-06-26 · Transmeta Crusoe processor running Windows applications Includes processor, AGP](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2f1623a53dd039fc66dc9e/html5/thumbnails/38.jpg)
Brian T. Davispage 38
Direct Rambus (DRDRAM) Device Architecture