7 december 2011powering smartphone socs with multichannel tsv drams1 3d mania: powering smartphone...
TRANSCRIPT
Powering Smartphone SoCs with Multichannel TSV DRAMs7 December 2011 1
3D Mania:Powering Smartphone SoCs with
Multichannel TSV DRAMsDrew Wingard, Sonics
7 December 2011 Powering Smartphone SoCs with Multichannel TSV DRAMs 2
Concurrency in Smartphone SoCs
iScan
iQuant
iScan
iQuant iTransiTrans Recon-
struction
Recon-
structionLoop
Filter
Loop
Filter
Intra
Prediction
Intra
Prediction
MC
Prediction
MC
Prediction
Entropy
Decoding
Entropy
Decoding
Bitstream Decoded
FramesiScan
iQuant
iScan
iQuant iTransiTrans Recon-
struction
Recon-
structionLoop
Filter
Loop
Filter
Intra
Prediction
Intra
Prediction
MC
Prediction
MC
Prediction
Entropy
Decoding
Entropy
Decoding
Bitstream Decoded
Frames
H.264 Decode
Audio Decode
Video Out
Transport demux
Smartphone SoCs process data in parallel, but communicate…
7 December 2011 Powering Smartphone SoCs with Multichannel TSV DRAMs 3
Concurrency in Smartphone SoCs
iScan
iQuant
iScan
iQuant iTransiTrans Recon-
struction
Recon-
structionLoop
Filter
Loop
Filter
Intra
Prediction
Intra
Prediction
MC
Prediction
MC
Prediction
Entropy
Decoding
Entropy
Decoding
Bitstream Decoded
FramesiScan
iQuant
iScan
iQuant iTransiTrans Recon-
struction
Recon-
structionLoop
Filter
Loop
Filter
Intra
Prediction
Intra
Prediction
MC
Prediction
MC
Prediction
Entropy
Decoding
Entropy
Decoding
Bitstream Decoded
Frames
H.264 Decode
Audio Decode
Video Out
Transport demux
DRAM
MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.6 July 2011 4
Smartphone Problems and Opportunities
■ Problems in current systems• In mobile SoCs, DRAM bandwidth needs grow faster than
capacity needs• Scaling DRAM bandwidth requires extra DRAMs
o And power-hungry PHYs
• Wider DRAM interfaces & deeper pipelining increases access granularity, driving need for multichannel approacheso Which causes extra pin costs (after 2 channels)
• Current DRAM interfaces are a bottleneck betweeno Lots of parallel initiators (data clients)o Lots of parallel DRAM banks (data servers)
■ What if we could get rid of the interface bottleneck?• And be limited by DRAM bank bandwidths, instead• Without needing fancy PHYs
MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.6 July 2011 5
Wide I/O: TSV-enabled Mobile DRAM
■ Pending JEDEC standard (JC 42.6)■ High bandwidth, even at low capacity
• Like 4 (x32) channels of LPDDR2-800• 12.8 GBytes/sec peak bandwidth
■ Lowest power• No PHY (simple drivers/receivers, no PLL/DLL)• Low loading (capacitance and inductance)• Modest frequency (200 MHz)
■ Smallest form factor• 3D stacked based on thin (50µm) TSV-based die
■ Minimal change to DRAM design (ex-TSV)■ Risks are all TSV-related (cost, yield, biz model)■ Smartphone market will drive volumes
MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.6 July 2011 6
Wide I/O Power Estimates
Sophie Dumas, ST-EJEDEC Mobile Memory ForumJune 2011 (Taiwan)
MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.6 July 2011 7
Multichannel DRAM System Challenges
Ap
plicatio
n V
iew
Ph
ysical Org
anizatio
nAddress Space
Region 2
Region 1
Hole 1
Region 3
Hole 2
2 ChannelsNo Interleave
Region 1
Hole 1
Region 3
Hole 2
2
2
2
2
1
1
1
1
Region 2
Region 1
Hole 1
Region 3
Hole 2
2
4
2
4
1
3
1
3
Region 2
Region 1
Hole 1
Region 3
Hole 2
Ch. 2
Ch. 1
Region 2
2 ChannelsInterleaved
4 ChannelsInterleaved
Key Problems: ■ Load balancing
• Must balance memory traffic evenly among channels
■ Maintaining throughput• Multiple channels cause
throughput & ordering issues for pipelined memories
Software and IP cores must manage multiple channels explicitly
7 December 2011 Powering Smartphone SoCs with Multichannel TSV DRAMs 8
Locality Challenges: DRAM Access Styles
Video Frame Buffer
2048 Bytes/row
1080 rows/fram
e
0x10F0 0x10F8 0x1100 0x1108
0x08F0 0x08F8 0x0900 0x0908
0x18F0 0x18F8 0x1900 0x1908
0x20F0 0x20F8 0x2100 0x2108
0x28F0 0x28F8 0x2900 0x2908
2-D Data Object in Memory
Video Frame Buffer
2-D DataObject inMemory
DRAM Page 0(on DRAM bank 0)
DRAM Page 0(on DRAM bank 1)
DRAM Page 0(on DRAM bank 2)
DRAM Page 0(on DRAM bank 3)
DRAM Page 1(on DRAM bank 0)
DRAM Page 1(on DRAM bank 2)
DRAM Page N(on DRAM bank 0)
DRAM Page N(on DRAM bank 1)
DRAM Page N+1(on DRAM bank 0)
2048 Bytes/row
16 rows/page
256B/page
1080 rows/fram
e
■ Exploiting spatial locality is key for high utilization• CPUs tend to stay with an O/S page (multiple DRAM pages)• Much processing and I/O DMA uses long incrementing bursts• Image processing is tougher
■ Two-dimensional bursts• 2D transaction using
a single OCP read orwrite command
• Popular for HD videoand graphics
■ Address tiling• Rearrange DRAM
address organizationto exploit locality
• Avoids page misses• One size doesn’t
fit all!
MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.6 July 2011 9
Interleaved Multichannel Technology (IMT*):Seamless Transition to Multichannel
■ Interleaving support requires splitting (and chopping!) traffic for delivery to proper channel• Splitting in memory controller creates performance and
routing congestion bottleneck■ Predictably high performance
• Automatically spreads traffic across channels to ensure load balancing
• Keeps DRAMs operating at full throughput without costly reorder buffers
■ Scalable architecture• Up to 8 interleaved channels within the same address region• Fully distributed to avoid bottlenecks & placement restrictions
■ Application flexibility• Transparent to software and initiator hardware• Supports full or partial memory configurations – at run time
Multichannel Interleaving in the InterconnectHigher Performance, Lower Area, More Scalable
*patents pending
7 December 2011 Powering Smartphone SoCs with Multichannel TSV DRAMs 10
Transparent Multichannel Interleavingwith Access-Optimized Boundaries
Ap
plicatio
n V
iew
Ph
ysical Org
anizatio
n
Ind
epen
den
t Interleavin
g S
ize Su
pp
ort
7 December 2011 Powering Smartphone SoCs with Multichannel TSV DRAMs 11
MemMax™ Memory Scheduler
Multi-threaded &multi-tagged OCP with
non-blocking flow control
In-order with earlypre-charge/activate
Compiled (SRAM) databuffers decouple rates& cross clock domain
■ Address tiling■ Request grouping■ 2D burst support■ Guaranteed
bandwidth QoS with demotion
■ Per-thread buffer sizing
■ Run time re-programmable
MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.6 July 2011 12
Complete Wide I/O System Solution
Sonics Interconnect with IMT
Video Pre
Processor
GraphicsEngine
VideoDecoder
VideoPost
Processor
Transport Engine
IA
HostCPU
AudioProcessor
Streaming Processor
Debug Interface
IA IA IA IA IA IA IA IA
Wide I/OController
0
MemMax0
TA
MemMax3
Wide I/OController
3
Wide I/ODRAM
TATATA
GPIO USB FLASH
TA
SRAM
TA
Storage
SOC
TA
■ Channel splitting and load balancing
■ Scheduling for high efficiency & QoS
■ Wide I/O interface
MPSoC '11: Leveraging 3D Multichannel DRAMs © 2011 Sonics, Inc.6 July 2011 14
Summary
■ Smartphone SoC performance dominated by DRAM
■ New JEDEC Wide I/O standard offers compelling performance and power benefits• Assuming TSV-related issues can be managed
■ 3D smartphone SoC solutions require:• Massive connectivity• Transparent multi-channel DRAM load balancing• High-efficiency DRAM scheduling with high QoS• Minimum energy usage
Powering Smartphone SoCs with Multichannel TSV DRAMs7 December 2011 15
THANK YOU!