design space exploration for 3d architectures€¦ · sematech 3d workshop, 10/11-12, 2007 yuan...
TRANSCRIPT
Design Space Exploration for 3D Architectures
Yuan Xie
The Pennsylvania State UniversityDepartment of Computer Science & Engineering
http://www.cse.psu.edu/~yuanxie Email: [email protected]
SEMATECH 3D workshop, Albany, NY, 10/11-12, 2007
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
How to Explore 3D Design Space
Early Design Analysis Tools for 3D IC designs to study the tradeoff between number of layers vs. performance and power density (thermal)
EDA Design Tools for 3D IC designsFloorplanning/placement&routingShould be “thermal-aware”
Rethink Circuits/Architecture design in 3D space“True” 3D component, not just simply stacking of 2D blocks onto multipleayersperformanceNew 3-D architectures
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Early Design Analysis Tools3DCACTI – 3D Cache Energy and Performance
Estimator HS3D -- 3D Thermal Estimation Tool
Physical Design Tools for 3D IC designsShould be thermal-aware
Architecture/Circuit redesignComponent Designs/architecture designs 3D network-on-chip
3D Research @ PennState
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
3D Divided Wordline (3DWL)
Blk0 Blk1 Blk2 Blk3
Blk4 Blk5 Blk6 Blk7
WLDec&Dr1
WLDec&Dr2
WLDec&Dr3
WLDec&Dr5
WLDec&Dr6
WLDec&Dr7
MUX & SA MUX & SA MUX & SA
MUX & SA MUX & SA MUX & SA MUX & SA
MUX & SA
WLDec&Dr0
WLDec&Dr
256xBLs
128x
WL
s
WL Pre-Dec
data outputs
addressinputs
Blk3-2
Blk7-2
SA 3-2
SA 7-2
3
7
Blk2-2
Blk6-2
SA
SA
2
6
Blk1-2
Blk5-2
SA
SA
1
5
Blk0-2
Blk4-2
SA
SA
LWLDec&Dr
4
128x
WL
s
128xBLs
dataoutputs
Blk3-1
Blk7-1
SA 3-1
SA 7-1
3-1
7-1
Blk2-1
Blk6-1
SA 2-1
SA 6-1
2-1
6-1
Blk1-1
Blk5-1
SA 1-1
SA 5-1
1-1
5-1
Blk0-1
Blk4-1
SA 0-1
SA 4-1
LWLDe
c&Dr
4-1
128x
WL
s
WL Pre-Dec
address inputs
128xBLs
dataoutputs
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
3D Divided Bitline (3DBL)
Blk0 Blk1 Blk2 Blk3
Blk4 Blk5 Blk6 Blk7
WLDec&Dr1
WLDec&Dr2
WLDec&Dr3
WLDec&Dr5
WLDec&Dr6
WLDec&Dr7
MUX & SA MUX & SA MUX & SA
MUX & SA MUX & SA MUX & SA MUX & SA
MUX & SA
WLDec&Dr0
WLDec&Dr
256xBLs12
8xW
Ls
WL Pre-Dec
data outputs
addressinputs
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
3DCacti : 3D cache delay-energy estimatorEnhancements from original CACTI:
Delay model, Layout parameters, Technology parameters
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1x1 MLBS 2x1 1x2 4x1 2x2 1x4 8x1 4x2 2x4 1x8
3D Partitioning (Nx*Ny)
Del
ay (n
S)
outputSABLWL-chargeWL_driverdecoderpredec_driver
Download: http://www.cse.psu.edu/~mdl/download
C= 1MB
B=16 byte
A=4
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
New thermal estimation library HS3d for 3DBased on Hotspot 2.0 from UVAOptimized routines allow over 1000X speedup for large designsAllows estimation with arbitrary resolutionSupports multi-layer wafer stacks,
Verified by comparison to FEM modelSame mathematical model as Hotspot
HS3d: Thermal Estimation for 3D IC
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Thermal Estimation with HS3d
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Early Design Analysis Tools3DCACTI – 3D Cache Energy and Performance
Estimator HS3D -- 3D Thermal Estimation Tool
Physical Design Tools for 3D IC designsThermal-aware 3D floorplanningIBM 3D Design Flow
Architecture/Circuit redesignComponent Designs/architecture designs 3D network-on-chip
3D Research @ PennState
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Using HS3D for Design Evaluation
Core-over-CoreCore-over-CacheSide-by-side vertically integrated cores
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Early Design Analysis Tools3DCACTI – 3D Cache Energy and Performance
Estimator HS3D -- 3D Thermal Estimation Tool
Physical Design Tools for 3D IC designsThermal-aware 3D floorplanningIBM 3D Design Flow
Architecture/Circuit redesignComponent Designs/architecture designs 3D network-on-chip
3D Research @ PennState
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
True 3D Components
33%29%16%21%4layers27%29%15%20%3layers21%15%10%15%2layers
Performance Improvement3.793.231.311.744 layer
4.13.241.321.753 layer4.463.881.411.872 layer5.644.541.562.22d
Array multiplier (8bit)
Booth multiplier (9bit)
Kogge StoneAdder (32bit)
Bent KungAdder (32bit)
Delay (ns)
Example: Arithmetic Unit Design
Observation: The benefits diminish as the number of layers increases
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Issue Logic
Adder
Shifter© PSU
11% average IPC speed-up
3D Microprocessor Design
Ref: VLSI Design 2007, IEEE Micro 2007
3D design allows either
1. Larger & complex designs, or
2. Faster Designs
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
• Each tile includes a small core and a router router connecting the connecting the core to an oncore to an on--chip networkchip network
A Network-on-Chip Solution!
Courtesy: C. Nicopoulos
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Processing Element
PE( )NICCPU
NIC
CPU
CPU
CPU
NICR R
NICR
VIDEO
MPEG
RNIC
R
R
NICR
Network-on-Chip (NoC) Basics…Processing Elements (PEs) interconnected via a packet-based network
Router
b-bitLinks
AudioNIC
CPU.NIC
R
CPUNIC
NICR
Rb b
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
NUCA: Non-Uniform Cache Access
Trends:Interconnect delay becomes important The size of L2 cache continues to
increase
Result: Access to different blocks in cache will have different latencies (non-uniform)
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Non-uniform Cache Architectures (NUCA)
Bank 0
Bank 1
SNUCA – statically partitions addresses across the banks
Cache Controler
switchAddress/data lines
DNUCA – dynamically migrates cache lines
CMP-DNUCA (D.Wood et al.)
CMP-NuRapid (Vijaykumar et.al)
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Proposed : 3D Network-in-MemL2 Cache bank / or CPUPillar node
Processing Element
(Cache Bank or CPU)
NIC
Rb bits
Single-Stage Router
Processing Element
(Cache Bank or CPU)
NIC
Rb bits
InputBuffer
OutputBu ffer
dTDMA Bus
NoC
NoC/Bus Interface
b-bit dTDMA Bus (Communication Pillar)
orthogonal to slide
Single-Stage Router
InputBuffer
OutputBu ffer
dTDMA Bus
NoC/Bus Interface
b-bit dTDMA Bus (Communication Pillar)
orthogonal to slide
Router
Communication Pillar
dTDMA Bus (Dynamic Time-Division Multiple Access)
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
The dTDMA Bus as the Communication Pillar
1500 um
10~100 um
Use dTDMA bus (VLSID 2006)V efficient/fast busV small area/power overhead
laye
rs
Router
dTDMA BusArbiter
Do not use multi-hop for vertical communicationx vertical distance is so small
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
3D Benefit: Increased Locality CPU Nodes within 1 hopNodes within 2 hops Nodes within 3 hops
dTDMA pillar
2D vicinity
3D vicinity
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Experimental SetupSimulation framework
Simics (running Solaris 9, in-order single issue processor)3D NoC simulator
BenchmarkSPEC OMP
Default configuration parameters 2-layer, 8-CPU, 8-pillar 3D integration64 KB, 2-way L1 split I/D cacheL2 Cache 16 MB (256 x 64 KB)
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Average L2 Hit Latency
05
10152025303540
ammp
apsi art
equake
fma3d
galgel mgri
d
swimwupw
iseAve
rage
L2
Hit
Lat
ency
(Cyc
les) CMP-DNUCA CMP-DNUCA-3D CMP-SNUCA-3D
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Impact of the Number of Pillars
0
5
10
15
20
25
ammp
apsi art
equa
kefm
a3d
galge
lmgr
idsw
imwup
wiseA
vera
ge L
2 H
it L
aten
cy (C
ycle
s) 8 pillars 4 pillars 2 pillars
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Impact of the Number of Layers
0
5
10
15
20
25
ammp
apsi art
equa
kefm
a3d
galge
lmgr
idsw
imwup
wiseA
vera
ge L
2 H
it L
aten
cy (C
ycle
s) 2-layer 4-layer
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
3D Problem: Increased Power Density Use 3D thermal estimation tool to guide placement Thermal-aware placement to reduce the thermal impact
Cache Bank Node
CPU Node
CPUs offset in all three dimensions to avoid hotspots
2D: Tpeak = 111 C3D: Tpeak = 173 C3Dopt: Tpeak = 119 C
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
Conclusions3D Integration has great potential to mitigate interconnect crisis
EDA tool support is important for thermal-aware 3D design space exploration
Architects built high-rises. How can computer architects take advantage of 3D integration?
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
AcknowledgementCollaborators
Faculty: N. Vijaykrishnan, M. J. Irwin, C. Das, Jason Cong (UCLA)Students: C. Nicopoulos Tom Richardson, Yuh-fang Tsai, FengWang ,Xiaoxia Wu, Paul Falkenstern, B. Vaidyanathan.
Advice and industrial support from IBMKerry BernsteinRuchir PuriAlbert YoungMike Ignatowski
Current 3D research is supported by DARPA/IBM
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
More Details X. Wu, P. Falkstern, Y. Xie, “Scan Chain Designs in 3D IC”, IEEE International Conference on Computer Design (ICCD 2007). G. Loh, Y. Xie, and B.Black, “3D processor Design” , IEEE Micro, 2007J. Kim, C.A. Nicopoulos, D. Park, R. Das, Yuan Xie, N. Vijaykrishnan, and C.R. Das, A Novel Dimensionally-Decomposed Router for On-Chip Communication in 3D Architectures. International Symposium on Computer Architecture (ISCA‘2007).B. Vaidyanathan, W. -L. Hung, F. Wang, Y. Xie, V. Narayanan, M. J. Irwin, "Architecting Microprocessor Components in 3D Design Space", Intl. Conference on VLSI Design 2007.Yuan Xie, G. Loh, B. Black, K. Bernstein. Design Space Exploration for 3D Architecture. ACM Journal of Emerging Technologies for Computer Systems 2(2):65-103.Li, F., C. Nicopoulos, T. Richardson, Y. Xie, N. Vijaykrishnan, M. Kandemir. Design and Management of 3D Chip Multiprocessors using Network-in-memory. Proceedings of the International Symposium on Computer Architecture (ISCA'06). pp. 130-141.W. Hung, G. Link, Y. Xie, N. Vijaykrishnan, M. J. Irwin. Interconnect and Thermal-aware Floorplanning for 3D Microprocessors. Proceedings of the International Symposium on Quality Electronic Design (ISQED 2006). pp. 98-104.Ozturk O., F. Wang, M. Kandemir, Y. Xie. Optimal Topology Exploration for Application-Specific 3D Architectures. Proceedings of the Eleventh Asia and South Pacific Design Automation Conference (ASP-DAC 2006). pp. 390-395. Tsai, Y-F., Y. Xie, N. Vijaykrishnan, M. J. Irwin Three-Dimensional Cache Design Exploration Using 3DCacti. Proceedings of the IEEE International Conference on Computer Design (ICCD 2005). pp. 519-524
http://www.cse.psu.edu/~yuanxie/3d.html
Yuan Xie, Penn State UniversitySEMATECH 3D workshop, 10/11-12, 2007
“…Space on the ground is running out…The only way is up…The question is not whether this reach for the sky will happen - it's happening now - but what it portends.”Hugh Pearman, May 2004 (The Sunday Times, UK)