single pipeline performance - eecs · advantages and disadvantages of each design ... • the rtl...
TRANSCRIPT
• Reliability is a big concern in Multi-core Processors as the technology nodes are scaling
• Inter-Core resource sharing increases reliability in presence of fault (e.g. StageNet Architecture)
• This leads to high communication delay• 3D integration is a promising solution to increase density and
performance
Introduction
Objective
Architecture
Implementation
Conclusion
MaPnet: A Three-Dimensional Fabric for Reliable Multi-core ProcessorsJavad Bagherzadeh, Sugandha Gupta, Byounchan Oh
University of Michigan, Ann Arbor
• 3D architectures have performance advantage in many core designs (Figure 6)
• MAPnet design is implemented on a 4-core architecture with simple cores
• TSV delay and behavior is simulated by SPICE simulations and used to determine routing delay and processor frequency
Table 1. Advantages and disadvantages of each design
Results
• Use StageNet based structure with two functionalities in case of failure Arranging healthy resources in different cores to create one
healthy pipeline (Virtual Pipeline) Sharing healthy resources in other cores when they are not
being used • Use 3D integration to reduce communication delay and hence
improve performance over 2D design • Model Through-Silicon Vias (TSV) to simulate the 3D design and
performance
Advantages Disadvantages
Regular Design -Easy To Design -Slow
-Low Scalability
Pipelined Design -Higher clock Frequency
-Easy Dynamic
Reconfiguration
-High Flexibility in Sharing
Resources
-Better scalability
-Difficult to Design
-Need extra control units
-Area Overhead
2D-Metal line delay
3D-TSV delay
MAPnet Design 2D Core-based 3D Pipeline 3D
Clock
(No Fault)
2.5 ns
(400 MHz)
2.5 ns
(400 MHz)
2.5 ns
(400 MHz)
Clock
(Fault Condition)
7 ns
(143 MHz)
3.95 ns
(250MHz)
3.85
(260MHz)
Footprint Area 1200*1200 sq.us 600*600 sq. us
Layout Density 71% 67%
Scenario Number of Faults Disabled Resources
Scenario 1 2 IF0, EX1
Scenario 2 4 IF0, ID1, EX2, WB3
Scenario 3 10 IF0, ID1, EX0, MEM1,
WB0, IF2, ID3, EX2,
MEM3, WB2
• Both 3D designs show performance improvement over the 2D design
• The throughput increases by 1.74X• On comparing the post-APR frequencies in
faulty and healthy cores and considering Table 1, we decided to pipeline both architectures to reduce clock frequency
• Crossbars in 2D and 3D structures need 1 and 2 pipeline stages respectively.
• The RTL was synthesized using Synopsys Design Compiler and Cadence Encounter was used to create the layout
• IPC was measured for each scenario
• Figures 8 reflect the results for baseline 2D and 3D pipelined structures regarding sharing the same resource and no sharing.
• Each fault forces the processor to reconfigure and adds extra delays cycles into pipeline structure to use resources in other cores
• We simulated a single core for different test cases and measured the IPC for extra delay cycles added in pipeline structure which is reflected in Figure 7.
• Improvement with 2D and 3D StageNet compared 2 baseline
• 16.3% on average and up to 28.2% improvement in 3D compared to 2D
• area of our 3D design is under than 25% of the traditional 2D design due to reduced interconnection complexity.
Figure 2. Three different designs for MAPnet (a) 2D structure with crossbars (b) Core-based 3D structure (c) Pipeline-based 3D structure
(a) (b) (c)
Figure 3. Single core architecture with connection to crossbar and pipeline registers
Figure 5. Layouts for (a) 2D 4-core processor, (b) Basic 1-core and (c) Conceptual 3D 4-core Design
(a)
(c)
(b)
Figure 6. Estimated Clock Period for different number of cores for MAPnet – (a) 2D structure with crossbars (b) Core-based 3D structure (c) Pipeline-based 3D structure
Figure 4. Comparison between routing delay for 2D Metal vs 3D TSVs
Table 2. Clock Period results after Place and Route (using Encounter)
Table 3. shows 3 different fault scenarios for various fault numbers and locations within pipeline stages
ID/EX
IF/ID
MEM/WB
EX/MEM
Inputs from other cores
Outputs to other cores
Inputs from other coresOutputs to other cores
FETCH EXECUTEDECODE MEMORY WRITEBACK
CORE 0
• MaPnet: Using inter-core redundancy to salvage the processor in case of faults. 2D design Core-based 3D design Pipeline-based 3D design
ID 1
EX 3
MEM 1
WB 2
IF 0
CORE 3
IF 3
IF 2
IF 1
ID 0
ID 2
ID 3
EX 0
EX 1
EX 2
MEM 0
MEM 2
MEM 3
WB 0
WB 1
WB 3
CORE 2
CORE 1
CORE 0
Figure 1. A 5-satge pipeline structure for 4 cores and switches between stages
I
F
0
I
D
0
E
X
0
M
E
M
0
W
B
0
I
F
1
I
D
1
E
X
1
M
E
M
1
W
B
1
CROSS-
BAR
I
F
3
I
D
3
E
X
3
M
E
M
3
W
B
3
I
F
2
I
D
2
E
X
2
M
E
M
2
W
B
2
CORE 0 CORE 1
CORE 3CORE 2
0.0
0.2
0.4
0.6
0.8
1.0
IPC
Single Pipeline Performance0-delay 1-delay 2-delay 3-delay 4-delay 5-delay 6-delay 7-delay 8-delay
Figure 7. Single Pipeline Performance
Figure 8. IPC for different fault scenarios
• 3D reconfigurable pipeline design, named MaPNet.
• more-reliable operation, higher performance, lower cost, and/or lower power consumption by taking advantage of the redundancy and capabilities available at each layer in the system stack.
EECS 578 Correct Operation for Processors and Embedded Systems
Fall 2015
0.0
1.0
2.0
3.0
4.0
5.0
IPC
Fault Scenario 1
Baseline No sharing (2D) No sharing (3D)Fully sharing (2D) Fully sharing (3D)
0.0
1.0
2.0
3.0
4.0
5.0
IPC
Fault Scenario 2
Baseline No sharing (2D) No sharing (3D)Fully sharing (2D) Fully sharing (3D)
0.0
1.0
2.0
3.0
4.0
5.0
IPC
Fault Scenario 3
Baseline No sharing (2D) No sharing (3D)Fully sharing (2D) Fully sharing (3D)