ee384y: packet switch architectures part ii scaling crossbar switches
DESCRIPTION
EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches. Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University [email protected] http://www.stanford.edu/~nickm. Outline. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/1.jpg)
EE384y 12004
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
EE384Y: Packet Switch ArchitecturesPart II
Scaling Crossbar Switches
Nick McKeownProfessor of Electrical Engineering and Computer Science, Stanford University
[email protected]://www.stanford.edu/~nickm
![Page 2: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/2.jpg)
EE384y 22004
Outline
Up until now, we have focused on high performance packet switches with:
1. A crossbar switching fabric,2. Input queues (and possibly output queues as well),3. Virtual output queues, and4. Centralized arbitration/scheduling algorithm.
Today we’ll talk about the implementation of the crossbar switch fabric itself. How are they built, how do they scale, and what limits their capacity?
![Page 3: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/3.jpg)
EE384y 32004
Crossbar switchLimiting factors
1. N2 crosspoints per chip, or N x N-to-1 multiplexors
2. It’s not obvious how to build a crossbar from multiple chips,
3. Capacity of “I/O”s per chip. State of the art: About 300 pins each
operating at 3.125Gb/s ~= 1Tb/s per chip. About 1/3 to 1/2 of this capacity available
in practice because of overhead and speedup.
Crossbar chips today are limited by “I/O” capacity.
![Page 4: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/4.jpg)
EE384y 42004
Scaling number of outputs: Trying to build a crossbar from multiple
chips
4 inp
uts
4 outputs
Building Block: 16x16 crossbar switch:
Eight inputs and eight outputs required!
![Page 5: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/5.jpg)
EE384y 52004
Scaling line-rate: Bit-sliced parallelism
Linecard
Cell
Cell
Cell
SchedulerScheduler
• Cell is “striped” across multiple identical planes.
• Crossbar switched “bus”.
• Scheduler makes same decision for all slices.
1
2345678
k
![Page 6: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/6.jpg)
EE384y 62004
Scaling line-rate: Time-sliced parallelism
Linecard
SchedulerScheduler
• Cell carried by one plane; takes k cell times.
• Scheduler is unchanged.
• Scheduler makes decision for each slice in turn.
1
2345678
k
Cell
Cell
Cell
Cell
Cell
Cell
![Page 7: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/7.jpg)
EE384y 72004
Scaling a crossbar
Conclusion: scaling the capacity is relatively straightforward (although the chip count and power may become a problem).
What if we want to increase the number of ports?
Can we build a crossbar-equivalent from multiple stages of smaller crossbars?
If so, what properties should it have?
![Page 8: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/8.jpg)
EE384y 82004
3-stage Clos Network
n x k
m x m
k x n1
N
N = n x mk >= n
1
2
…
m
1
2
…
…
…
k
1
2
…
m
1
N
n n
![Page 9: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/9.jpg)
EE384y 92004
With k = n, is a Clos network non-blocking like a crossbar?
Consider the example: scheduler chooses to match(1,1), (2,4), (3,3), (4,2)
![Page 10: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/10.jpg)
EE384y 102004
With k = n is a Clos network non-blocking like a crossbar?
Consider the example: scheduler chooses to match(1,1), (2,2), (4,4), (5,3), …
By rearranging matches, the connections could be added.Q: Is this Clos network “rearrangeably non-blocking”?
![Page 11: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/11.jpg)
EE384y 112004
With k = n a Clos network is rearrangeably non-blockingRouting matches is equivalent to edge-coloring in a bipartite multigraph.Colors correspond to middle-stage switches.
(1,1), (2,4), (3,3), (4,2)
Each vertex corresponds to an n x k
or k x n switch.
No two edges at a vertex may be colored the same.
Vizing ‘64: a D-degree bipartite graph can be colored in D colors.Therefore, if k = n, a 3-stage Clos network is rearrangeably non-blocking(and can therefore perform any permutation).
![Page 12: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/12.jpg)
EE384y 122004
How complex is the rearrangement?
Method 1: Find a maximum size bipartite matching for each of D colors in turn, O(DN2.5).
Method 2: Partition graph into Euler sets, O(N.logD) [Cole et al. ‘00]
![Page 13: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/13.jpg)
EE384y 132004
Edge-Coloring using Euler sets
Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O(E)].
For D=2i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E).
![Page 14: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/14.jpg)
EE384y 142004
Euler partition of a graph
Euler partiton of graph G: 1. Each odd degree vertex is at the end of one open path.2. Each even degree vertex is at the end of no open path.
![Page 15: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/15.jpg)
EE384y 152004
Euler split of a graph
Euler split of G into G1 and G2:1. Scan each path in an Euler
partition.2. Place each alternate edge
into G1 and G2
GG1
G2
![Page 16: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/16.jpg)
EE384y 162004
Edge-Coloring using Euler sets
Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O(E)].
For D=2i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E).
![Page 17: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/17.jpg)
EE384y 172004
Implementation
SchedulerScheduler Route connections
Route connections
Requestgraph
Permutation Paths
![Page 18: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/18.jpg)
EE384y 182004
Implementation
Pros A rearrangeably non-blocking switch can
perform any permutation A cell switch is time-slotted, so all connections
are rearranged every time slot anywayCons Rearrangement algorithms are complex (in
addition to the scheduler)
Can we eliminate the need to rearrange?
![Page 19: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/19.jpg)
EE384y 192004
Strictly non-blocking Clos Network
Clos’ Theorem: If k >= 2n – 1, then a new connection can alwaysbe added without rearrangement.
![Page 20: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/20.jpg)
EE384y 202004
I1
I2
…
Im
O1
O2
…
Om
M1
M2
…
…
…
Mk
n x k
m x m
k x n1
N
N = n x mk >= n
1
N
n n
![Page 21: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/21.jpg)
EE384y 212004
Clos Theorem
Ia Ob
x
x + n
1
n
k
1
n
k
1. Consider adding the n-th connection between1st stage Ia and 3rd stage Ob.
2. We need to ensure that there is always somecenter-stage M available.
3. If k > (n – 1) + (n – 1) , then there is always an M available. i.e. we need k >= 2n – 1.
n – 1 alreadyin use at input
and output.
![Page 22: EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches](https://reader036.vdocument.in/reader036/viewer/2022062409/56814a3e550346895db75b4b/html5/thumbnails/22.jpg)
EE384y 222004
Scaling Crossbars: Summary
Scaling capacity through parallelism (bit-slicing and time-slicing) is straightforward.
Scaling number of ports is harder… Clos network:
Rearrangeably non-blocking with k = n, but routing is complicated,
Strictly non-blocking with k >= 2n – 1, so routing is simple. But requires more bisection bandwidth.