from channel slicing to spatial division...
TRANSCRIPT
![Page 1: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/1.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
From Channel Slicing to From Channel Slicing to Spatial Division MultiplexingSpatial Division Multiplexing
---- the asynchronous router designthe asynchronous router design
Wei Song03/12/2009
![Page 2: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/2.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
IndexIndex
•• Channel SlicingChannel Slicing– Asynchronous NoCs and routers– Channel Slicing– A wormhole router design
• Spatial Division Multiplexing (SDM)– Motives– Switching networks– 2-stage Clos network– The distributed scheduler– Implementation results
![Page 3: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/3.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Asynchronous Asynchronous NoCsNoCs
• GALS• Full async comm fabric • QDI pipelines
• Low dynamic power• Tolerance to variation• Fast prototype
![Page 4: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/4.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
SynchronisedSynchronised QDI PipelinesQDI Pipelines
8
4
Nangate Cell Lib 65nm 1-of-4
![Page 5: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/5.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Channel Slicing (1)Channel Slicing (1)
• Remove the C-element tree• Sub-channels run
independently
![Page 6: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/6.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Channel Slicing (2)Channel Slicing (2)
![Page 7: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/7.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Channel Slicing (3)Channel Slicing (3)
sub-channels
![Page 8: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/8.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
The Wormhole RouterThe Wormhole Router
arbiter
arbiter
5 input ports
5 output ports
ctl
ctl
80
16
80
16
80
16
80
16
d_i_0
ack_i_0
d_i_4
ack_i_4
d_o_0
ack_o_0
d_o_4
ack_o_4
• Faraday 130 nm• 5 32-bit ports• 3 routers:
– Synchronised– Channel Sliced– Plus lookahead
N N+1
N+2
![Page 9: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/9.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Area Results
Channel Slicing: 23%extra controllers in input buffer increased wire count in crossbar
Lookahead: 5.3%extra AND gates and C2P elements on critical path
![Page 10: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/10.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Speed Results
Synchronised: 345MHzChannel Slicing: 450MHzChSlice+LH: 590MHz
![Page 11: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/11.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Compare with Other Routers
Asynchronous cell library: constrains the adaptation to other projects ANoC, ASPIN
Bundled-data: less tolerant to variationMANGO, QNoC, ASPIN
Customized design: design complexityASPIN
![Page 12: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/12.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Data Width Effect
0 20 40 60 80 100 120 1400.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0C
ycle
Per
iod
(ns)
Data Wdith of Ports (bit)
ChSlice + LH Channel Slicing Synchronised
![Page 13: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/13.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
IndexIndex
• Channel Slicing– Asynchronous NoCs and routers– Channel Slicing– A wormhole router design
•• Spatial Division Multiplexing (SDM)Spatial Division Multiplexing (SDM)– Motives– Switching networks– 2-stage Clos network– The distributed scheduler– Implementation results
![Page 14: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/14.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
SDM: Motivation (1)SDM: Motivation (1)
• The problems that the wormhole router cannot handle:– QoS, delay and throughput guaranteed services– Fault-tolerance– Network efficiency
![Page 15: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/15.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Motivation (2)Motivation (2)
Switch AllocatorInput Port 0
Input Port P-1
Output Port 0
Output Port P-1PxP
Crossbar
W
W
Input Buffer
Switch Scheduler
Input Port 0
Input Port P-1
Output Port 0
Output Port P-1
M
MPxMP
Switching Network
W/M
W/M
Wormhole
Virtual Channel SDM
![Page 16: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/16.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Motivation (3) Motivation (3) –– Problems of VCProblems of VC
• Pipelines are synchronised
• Area overhead• QoS (complicated
arbiters)• TDMA (time slot
definition)• Fault-tolerance (partial
faulty link)
Input Buffer
VC Allocator
Switch Allocator
Input Port 0
Input Port P-1
Output Port 0
Output Port P-1
M
PxP
Crossbar
W
W
![Page 17: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/17.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Motivation (4) Motivation (4) –– Benefits of SDMBenefits of SDM
• Delay and throughput Guarantee
• Fault-tolerance• Speed (Channel slicing)• Area• Link efficiency
– interruptsInput Buffer
Switch Scheduler
Input Port 0
Input Port P-1
Output Port 0
Output Port P-1
M
MPxMP
Switching Network
W/M
W/M
![Page 18: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/18.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Motivation (4) Motivation (4) –– Problems of SDMProblems of SDM
• Area overhead
• Scheduling Algorithm– Wormhole (P to 1)– SDM (MP to M)
2CBC P W= ×
2SDMC M P W= × ×
![Page 19: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/19.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
IndexIndex
• Channel Slicing– Asynchronous NoCs and routers– Channel Slicing– A wormhole router design
• Spatial Division Multiplexing (SDM)– Motives–– Switching networksSwitching networks– 2-stage Clos network– The distributed scheduler– Implementation results
![Page 20: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/20.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
SDM: Switching NetworksSDM: Switching Networks
• Strict Non-Blocking (SNB)– An input port and an output port is always
connectable• Rearrangeable Non-Blocking (RNB)
– An input port and an output port is connectable with possible changes on existing connections
• Blocking– Not all input ports and output ports are connectable
under certain cases
![Page 21: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/21.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
CrossbarCrossbar
• SNB2
CBC N W= ×
![Page 22: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/22.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
ClosClos NetworkNetwork
SNB/RNB
C(m,n,k)
N = nk
SNB: m >= 2n-1
RNB: m = n
![Page 23: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/23.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Benes NetworkBenes Network
Multi-stage Clos C(2,2,4) + 2C(2,2,2)
SNB
![Page 24: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/24.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Area of Switching NetworksArea of Switching Networks
![Page 25: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/25.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Problems of all Switching NetworksProblems of all Switching Networks
• Crossbar– Area ~ N2
– Easy to schedule• Clos
– Area ~ N1.5
– Difficult but possible to schedule by hardware– Optimal area is reached when
• Benes– Area ~ NlogN– Impossible to schedule by hardware (microprocessor)– Optimal area is reached when N=2n
![Page 26: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/26.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
IndexIndex
• Channel Slicing, the wormhole router• Spatial Division Multiplexing (SDM)
– Motives– Switching networks–– 22--stage stage ClosClos networknetwork– The distributed scheduler– Implementation results
![Page 27: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/27.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
SDM: 2SDM: 2--stage stage ClosClos NetworkNetwork
M M
SIM
M M
WIM
M M
NIM
M M
EIM
M M
LIM
5 5
CM(0)
5 5
CM(r)
5 5
CM(M-1)
SOM
WOM
NOM
EOM
LOM
![Page 28: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/28.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Area ComparisonArea Comparison
![Page 29: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/29.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Benefits of the 2Benefits of the 2--stage stage ClosClos NetworkNetwork
• Minimal area when M <= 16• Only have 2-stages, latency is reduced• Latency bounded• Scheduling algorithm is also simplified• The CMs could be further reduced
• It is a RNB network. An SNB network requires 3 stages
![Page 30: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/30.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
IndexIndex
• Channel Slicing, the wormhole router• Spatial Division Multiplexing (SDM)
– Motives– Switching networks– 2-stage Clos network–– The distributed schedulerThe distributed scheduler– Implementation results
![Page 31: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/31.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
SDM: Scheduling AlgorithmsSDM: Scheduling Algorithms
•• Optimized algorithmsOptimized algorithms– Always reach the optimal configuration that every possible
connection is configured– Time complexity O(N2)– Normally software based ( [Leroy 2008] microprocessor,
64 ports, 50us)•• Heuristic algorithmsHeuristic algorithms
– Capable of configuring part of the possible connections with less time and area
– Time complexity O(N) ~ O(logN)– Normally hardware implementable, distributed, and
scalable
![Page 32: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/32.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Synchronous Dispatch Synchronous Dispatch AlgsAlgs..
![Page 33: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/33.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Synchronous Dispatch Synchronous Dispatch AlgsAlgs..
![Page 34: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/34.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Synchronous Dispatch Synchronous Dispatch AlgsAlgs..
![Page 35: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/35.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Synchronous Dispatch Synchronous Dispatch AlgsAlgs..
![Page 36: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/36.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Synchronous Dispatch Synchronous Dispatch AlgsAlgs..
![Page 37: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/37.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Problems. Of Sync Problems. Of Sync AlgsAlgs..
• Iterations are synchronised.• The requests from IMs are blind and greedy.• CMs are blind and greedy too.• Multiple requests are sent out by IMs
![Page 38: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/38.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Asynchronous Scheduling Alg.Asynchronous Scheduling Alg.
![Page 39: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/39.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Asynchronous Scheduling Alg.Asynchronous Scheduling Alg.
![Page 40: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/40.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Asynchronous Scheduling Alg.Asynchronous Scheduling Alg.
![Page 41: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/41.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Asynchronous Scheduling Alg.Asynchronous Scheduling Alg.
![Page 42: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/42.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Asynchronous Scheduling Alg.Asynchronous Scheduling Alg.
• IM scheduler and CM schedulers are independent
• The scheduling algorithm can support arbitrary number of CMs
• Less transition rate than synchronous schedulers
![Page 43: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/43.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
IM scheduler (1)IM scheduler (1)
IMrb[VCN-1][SN-1]
IMr[0][0]
IMr[0][SN-1]
IMr[VCN-1][0]
IMr[VCN-1][SN-1]
IMrb[0][0]
IMrb[0][SN-1]
IMrb[VCN-1][0]
![Page 44: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/44.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
IM scheduler (2)IM scheduler (2)
h[i][j][k]
CMrKeep[i][k][j]CMrMx[i][k][j]
CMrMx[i][k][0]
CMrMx[i][k][VCN-1]
CMrME[k][i]
CMrME[k][0]
CMrME[k][SN-1]
CMr[k][0]
CMr[k][SN-1]
CMa[k][i]cfgMx[j][k][i]
cfgMx[j][k][0]
cfgMx[j][k][SN-1]
cfg[j][k]
cfg[j][0]cfg[j][CMN-1]
IMa[j]
CMrMx[i][k][j]CMr[k][i]
IMrb[j][i]CMrMx[j][0][i]
CMrMx[j][CMN-1][i]
CMs[k][i]CMs[k][i]
IMr[j][i]CMrKeep[i][k][j]
CMrME[k][i]CMsb[i][k]
![Page 45: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/45.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
CM schedulerCM scheduler
![Page 46: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/46.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
IndexIndex
• Channel Slicing, the wormhole router• Spatial Division Multiplexing (SDM)
– Motives– Switching networks– 2-stage Clos network– The distributed scheduler–– Implementation resultsImplementation results
![Page 47: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/47.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
SDM: implementation (1)SDM: implementation (1)
• Faraday 130nm• Wormhole, SDM crossbar, and SDM Clos• 64-bit ports, 4 virtual circuits/port• Design Compiler synthesized• System Verilog for testbench• Switches are back-annotated with latency from
synthesis
![Page 48: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/48.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
SDM: implementation (2)SDM: implementation (2)
![Page 49: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/49.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Network Performance (1)Network Performance (1)
![Page 50: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/50.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Network Performance (2)Network Performance (2)
![Page 51: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/51.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Conclusion of ResultsConclusion of Results
• SDM outperforms Wormhole with short frames and local traffic
• The connection loss from SNB to RNB is significant
• SDM is good at GT traffic, this work is the first step to a QoS router
• How to configure the SDM to settle GT paths is the next problem.
![Page 52: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/52.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
ReferencesReferences
• Channel Slicing:– ASP-DAC 2010. – UK Async Forum, 2009. – International Symposium on SOC, 2009.
• SDM– In submission to ASYNC 2010.
![Page 53: From Channel Slicing to Spatial Division Multiplexingwsong83.github.io/presentation/sdmnoc20091203.pdf• SDM outperforms Wormhole with short frames and local traffic • The connection](https://reader033.vdocument.in/reader033/viewer/2022042403/5f16c59c8f949a481436f743/html5/thumbnails/53.jpg)
2009-12-2Advanced Processor Technology GroupThe School of Computer Science
Questions?Questions?