Do We Need Wide Flits in Networks-On-Chip?
Do We Need Wide Flits in Networks-On-Chip?
Junghee Lee, Chrysostomos Nicopoulos, Sung Joo Park, Madhavan Swaminathan and Jongman Kim
Presented by Junghee Lee
2
IntroductionIntroduction
• Increasing number of cores Communication-centric Packet-based Networks-on-Chip
• Unit– Packet: a meaningful unit of the upper-layer protocol– Flit: the smallest unit of flow control maintained by NoC
• If a packet is larger than a flit, a packet is split into multiple flits
• The flit size usually matches with the physical channel width
3
MotivationMotivation
What is the optimal flit sizein Networks-on-Chipfor general purpose computing?
64 or 128Research
papers
256 or 512Research
papers
144Intel Single-Chip Cloud
160Tilera
256Intel Sandy
Bridge
4
Multifaceted FactorsMultifaceted Factors
Flit Size
Global Wires
Cost of Router
WorkloadLatency
Throughput
A first attempt in drawing balanced conclusion
5
Assumed NoC Router ArchitectureAssumed NoC Router Architecture
d
v
c
p
6
Packet and FlitPacket and Flit
Header Payload
7
Simulation EnvironmentSimulation Environment
Parameter Default Value
Simulator Simics + GEMS (Garnet)
Benchmark PARSEC
Number of processors 64
Operating system Linux Fedora
L1 cache size 32 KB
L1 cache number of ways 4
L1 cache line size 64 B
L2 cache (shared) 16 MB, 16-way, 128-B line
MSHR size 32 for I- and 32 for D- cache
Main memory 2 GB SDRAM
Cache coherence protocol MOESI directory
Topology 2D mesh
8
Default NoC ParametersDefault NoC Parameters
Parameter Default Value
Number of virtual channels 3
Buffer depth 8 flits per virtual channel
Number of pipeline stages 4
Number of ports 5
Header overhead 16 bits
9
Key Questions Key Questions
Can we afford wide flits as technology scales?
Is the cost of wide-flit routers justifiable?
How much do wide flits contribute to overall per-formance?
Do memory-intensive workloads need wide flits?
Do we need wider flits as the number of process-ing elements increases?
10
#1) Global Wires#1) Global WiresCan we afford wide flits as technology scales?
Technology scaling does not allow for a direct widening of the flits because the power portion of the global wires increases as technology scales
Item Unit Value
Technology nm 65 45 32 22
Chip size* mm2 260 260 260 260
Transistors* MTRs 1106 2212 4424 8848
Global wiring pitch* nm 290 205 140 100
Power index* W/GHz cm2 1.6 1.8 2.2 2.7
Total chip power* W 198 146 158 143
Normalized power portion 1.00 1.53 1.66 2.28
* International Technology Roadmap for Semiconductors (ITRS) 2009 and 2011
11
#2) Cost of Router #2) Cost of Router Is the cost of wide-flit routers justifiable?
Cost of buffers Flit size Buffer depth Number of virtual channels
Cost of switch (Flit size)2 (Number of ports)2
Flit size
CostSwitch
Buffer
Flit size 2 cost of router 2.97Flit size 4 cost of router 10.10
If the performance improvement does not compensate for the increase in the cost, widening of the flit size is hard to justify
12
#3) Latency#3) LatencyHow much do wide flits contribute to overall performance?
• The network traffic usually consists of packets of different sizes– ls: The size of shortest packet– ll: The size of longest packet
Flit size
Latency
ls+h ll+h
Suggested rule of thumb:Flit size = shortest packet size + header overhead
13
#4) Workload Characteristics#4) Workload Characteristics
Application Cache misses / Kcycle / node
Injected packets / Kcycle / node
Blackscholes 0.41 2.21
Bodytrack 0.67 3.56
Ferret 0.26 1.43
Fluidanimate 0.24 1.35
Freqmine 0.28 1.48
Streamcluster 0.48 2.42
Swaptions 0.38 2.04
Vips 0.23 1.27
X264 0.28 1.54
Do memory-intensive workloads need wide flits?
The injection rate of real applications is far less than the typical saturation point of NoC Self-throttling effect [34]
Up to 64 cores, we can keep the rule of thumb because of the low injection rate
14
#5) Throughput#5) Throughput
• Widening the flit is not a cost-effective way because of fragmentation
• If widening the physical channel is the only option for increasing the throughput, we suggest using physically separated networks
Do we need wider flits as the number of processing elements increases?
Flit size
Latency
One 80-bit networkOne 160-bit networkTwo 80-bit networks
15
ConclusionsConclusions
Can we afford wide flits as technology scales?
Is the cost of wide-flit routers justifiable?
How much do wide flits contribute to overall per-formance?
Do memory-intensive workloads need wide flits?
Do we need wider flits as the number of process-ing elements increases?
No, unless the power budget for NoC increases
No, the cost increases sharply with the flit size
Until the flit size reaches the shortest packet size
No, because of self-throttling effect
No, because of fragmentation
16
Final ConclusionFinal Conclusion
• Suggested rule of thumb:Flit size = shortest packet size + header overhead
• This paper provides a comprehensive discussion on all key aspects pertaining to the NoC’s flit size
• This exploration could serve as a quick reference for the designers/architects of general-purpose multi-core microprocessors who need to decide on an appropriate flit size for their design.
17
Thank you!Thank you!
18
Questions?Questions?
Contact info
Junghee [email protected] and Computer EngineeringGeorgia Institute of Technology