1 exploring design space for 3d clustered architectures manu awasthi, rajeev balasubramonian...

Post on 21-Dec-2015

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Exploring Design Space for 3D Clustered Architectures

Manu Awasthi, Rajeev BalasubramonianUniversity of Utah

2

Device Layer 2Vertical Interconnect

Silicon

1

• Multiple layers of active devices• Vertical interconnects between layers

Device Layer

Silicon

1

Courtesy: K.Bernstein, IBM

2D Chip

3D Chip

Layer 1

Layer 2

3D TechnologiesVerySmall

~ 10µm

3

Benefits of 3D • Reduction of global interconnect

L

L

• Delay/Power reduction• Bandwidth• Mix-technology integration

4

Previous Proposals

• Previously in 3D…– Break and stack (Folding) [Puttaswamy et

al]• Vertical stacking of active devices

RegFile

Break and Stack

All are active

HEAT!!!

Reduced Intra-block

latency

5

An alternative approach?

2D Chip

3D Chip

Die 1

Die 0

Prudent Stacking Can:

• Improve Performance

• Result in better thermal profile

6

Wire Delays and Performance

Impact of wire delays

0

5

10

15

20

25

30

35

40

45

50

0 2 4 6 8

Extra delay (in clock cycles)

Per

cent s

low

dow

n

DCACHE-INTALU

IQ-INTALU

RENAME-IQ

L1D-L2

BPRED-ICACHE

ICACHE-DECODE

DECODE-RENAME

DCACHE-FPALU

FPALU-INTALU

7

Clustered Architectures

• Centralized front-end– I-Cache & D-Cache– LSQ, Rename, Decode– Branch Predictor

• Clustered back-end– Issue Queue– Regfile, FUs

L1 DCache

Cluster

Crossbar/Router

Front-End

Higher clock Frequency, High ILP!!

8

Decentralized Cache Banks

L1 DCache

L1 DCache

L1 DCache

Possibly better performance

9

Decentralized Cache Banks

L1 DCache

Replicated Cache Banks

L1 DCache

L1 DCache

10

Decentralized Cache Banks

L1 DCache

Word Interleaved Cache Banks

L1 DCache

Odd Words Even Words

11

Outline

• Introduction– Motivation– 3D Architectures– Clustered Architectures

• Proposals• Results • Conclusions

12

Architecture 1

Cache-on-cluster

Die 1

Die 0

Cache Bank

Cluster

Inter Die Interconnect

Intra Die Interconnect

13

Architecture 2

Cluster-on-cluster

Die 1

Die 0

Cache Bank

Cluster

Inter Die Interconnect

Intra Die Interconnect

14

Architecture 3

Staggered

Die 1

Die 0

Cache Bank

Cluster

Inter Die Interconnect

Intra Die Interconnect

15

Outline

• Introduction– Motivation– 3D Architectures– Clustered Architectures

• Proposals• Results • Conclusions

16

Experimental Setup

• Framework– Simplescalar, Wattch and Hotspot 3.0– Wire model : 8x global metal plane

• Benchmarks– SPEC 2K, single threaded

• Processor Configuration– 8 Clusters– 64 kB L1 I/D Caches, 2 way set-assoc

• L1 Data cache Word-Interleaved or Replicated

• 2D Centralized Cache – Base Case

17

Base Case PerformancesPerformance Improvement wrt 2D Centralized Cache

0.01.02.03.04.05.06.07.08.09.0

Replicated WI

Cache Bank Type

Per

form

ance

Impr

ovem

ent Best Case 2D Config

18

The 3D EffectAverage Performance Improvement

0

2

4

6

8

10

12

14

16

Arch 1 Arch 2 Arch 3

Perc

enta

ge Im

prov

emen

t ove

r 2D

Cent

raliz

ed

3D Replicated vs 2D Centralized

19

The 3D EffectAverage performance Improvement

0

5

10

15

20

25

Arch 1 Arch 2 Arch 3Perc

enta

ge Im

pro

vem

ent over

Centr

alized

3D WI vs 2D Centralized

20

Comparisons

Average Performance Improvement wrt 2D Centralized

0

5

10

15

20

25

Arch 1 Arch 2 Arch 3

IPC

Impr

ovem

ent

Average performance Improvement wrt 2D Centralized

0

5

10

15

20

25

Arch 1 Arch 2 Arch 3IP

C Im

prov

emen

t

3D Replicated 3D WI

Best Case 3D - Rep Best Case 3D - WI

12% Improvement for best case 3D vs best case 2D

Best Case 2D

2D Case

Base Case Performance Comparisons

0

5

10

15

20

25

Replicated WI

IPC

Impr

ovem

ent

21

Thermal Analysis

• Wattch for power numbers• HotSpot 3.0 for thermal model (grid)

– 500x500 grid resolution

• Interconnect power modeling– Attributed to functional units– 8X plane wires– Router + Crossbar modeled as separate

entity

22

Thermal Profiles

0

20

40

60

80

100

120

Base Arch 1 Arch 2 Arch 3

Pea

k Tem

p - H

ottes

t U

nit (C

)

Peak Temperature : Hottest on-chip Unit (Celsius)

23

Outline

• Introduction– Motivation– 3D Architectures– Clustered Architectures

• Proposals• Results • Conclusions

24

Conclusions

• Wire delays are critical to performance– Some are more important than others.

• Prudent block stacking– Performance improvement upto 12% over

2D• WI banks + Arch 3 (3D)

– Better thermal profiles compared to folding

25

Backup Slides

26

Cluster

(a) Arch-1 (cache-on-cluster) (b) Arch-2 (cluster on cluster) (c) Arch-3 (staggered)

Cache bank Intra-die horizontal wire Inter-die vertical wire

Die 1

Die 0

4 Cluster Arrangements

top related