argo - an area-efficient time-division-multiplexed network ... · argo – an area-efficient...
TRANSCRIPT
![Page 1: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/1.jpg)
Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms Jens Sparsø [email protected]
![Page 2: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/2.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 2 DTU Compute, Technical University of Denmark
Introduction – NoC for real-time • Hard real-time systems need
worst case execution time (WCET) guarantees: – Execution time of tasks
• Time predictable processor cores – Guaranteed latency and bandwidth of task-to-task communication
• Time predictable NoC • Time predictable NoC
– End-to-end virtual circuits • No interference among traffic flows • Analysis of individual traffic flows one-by-one
– Solution: • Time division multiplexing (TDM) and static scheduling • Rate control and non-blocking routers + network calculus
![Page 3: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/3.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 4 DTU Compute, Technical University of Denmark
Q: Why yet another NoC after 10-15 years of NoC-research?
Q: Why yet another TDM-based NoC? – TU/e: Æthereal, Aelite, dAelite – KTH: Nostrum – TU Vienna: TTNoC
A: Many NoC designs, both BE og GS, have very large implementations. – Buffers and flow control – Size of router + NI often approach size of processor core. [As I see it] a result of : – Focus on providing solutions – Focus on layering and encapsulation
Introduction – why another NoC
![Page 4: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/4.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 5 DTU Compute, Technical University of Denmark
Introduction – Argo • TDM scheduling requires a common global notion of time. • Chip technology calls for Globally-Asynchronous Locally-Synchronous
(GALS) or Mesochronous timing architecture
• ARGO combines TDM and GALS/Mesochronous – Individually clocked processors (GALS) – Mesochronous network interfaces – Asynchronous routers
• ARGO avoids all buffering and (run time) flow control. TDM-mechanism transfer data end-to-end, i.e., SPM to SPM. (not just NI to NI as in other NoCs)
– A novel NI microarchitecture with a very small HW-implementation
![Page 5: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/5.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 6 DTU Compute, Technical University of Denmark
Acknowledgements • Presentation is based on joint work with:
– Associate Professor Martin Schoeberl – Wolfgang Puffitsch (Post Doc. 2013-2016) – Evangelia Kasapaki (PhD student 2011-2015) – Rasmus Bo Sørensen (PhD student 2013-2016)
• Work has been partly funded by: – European Union’s 7th Framework Programme: Time-predictable
Multi-Core Architecture for Embedded Systems (T-CREST) under grant agreement no. 288008. [2011-14]
– “Hard Real-Time Embedded Multiprocessor Platform - RTEMP” funded by the Danish Research Council for Technology and Production Sciences under contract no. 12-127600. [2013-2016]
![Page 6: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/6.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 7 DTU Compute, Technical University of Denmark
Outline 1. Introduction 2. Background
– The T-CREST multi-core platform – Message passing, TDM and static scheduling
3. Architecture and implementation of Argo (globally synchronous) – Router – NI microarchitecture – Results (area)
4. Timing organization of Argo – Individually clocked processor cores – Mesochronous NIs – Network of asynchronous routers
5. Analysis of (clock and reset) skew tolerance 6. Generating schedules 7. Conclusion
![Page 7: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/7.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 8 DTU Compute, Technical University of Denmark
Meassage passing using DMAs + virtual circuits
CPU
I$ D$ SPM
DMA
NI
CPU
I$ D$ SPM
DMA
NI
CPU
I$ D$ SPM
DMA
NI
R R R
R R
Send(…) Receive(…)
![Page 8: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/8.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 9 DTU Compute, Technical University of Denmark
The T-CREST multicore platform • All components are designed to be time-predictable
• PATMOS processors – Dual issue RISC – Special caches (method, stack, …) – Private scratch pad memories (SPM)
• Argo NoC supporting message passing – TDM and static scheduling – Supports GALS timing organization
• Memory tree NoC – All processors towards one memory – TDM and static scheduling
Proc. node
![Page 9: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/9.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 10 DTU Compute, Technical University of Denmark
Communicating tasks, communicating processors, and scheduling of packets.
t2 t1
t4
t3
t6
t5
c0
c2
c1
c3
c4
c6
c7
c8 c9
c5
c10
NoC
P
t1 t2
P t3
P
P
t4
c4 c5 c8
c7
c6
c2 c3
t5 t6
Task graph Core communication graph
![Page 10: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/10.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 11 DTU Compute, Technical University of Denmark
Communicating tasks, communicating processors, and routing of packets in the NoC
t2 t1
t4
t3
t6
t5
c0
c2
c1
c3 c4
c6
c7
c8 c9
c5
c10
t1
t3
t2
t4 t5 t6 P
t1 t2
P t3
P
P t4
c4 c5 c78
c6
c23
t5 t6
c4
NoC c23
c4
TDM period
c23 c4
![Page 11: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/11.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 12 DTU Compute, Technical University of Denmark
Synchronous router for source-routed TDM-based NoC
HPU
HPU
HPU
. . .
. . .
. . .
. . .
X bar
Data[31:0] Address[15:0] Route[15:0]
Header flit Payload flit
Link (N)
Link (E)
Link (L)
Link (N)
Link (E)
Link (L)
Payload flit
Data[31:0]
Packet:
![Page 12: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/12.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 13 DTU Compute, Technical University of Denmark
Processor
P1
Processor
P2
Processor
P4
Processor
P4
Processor
P5
Processor
P6
X X X
X X X
Global schedule: 1) Departure time 2) Route
Packet ≈
![Page 13: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/13.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 14 DTU Compute, Technical University of Denmark
Processor
P1
Processor
P2
Processor
P4
Processor
P4
Processor
P5
Processor
P6
X X X
X X X
Global schedule: 1) Departure time 2) Route
Packet ≈
![Page 14: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/14.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 15 DTU Compute, Technical University of Denmark
Processor
P1
Processor
P2
Processor
P4
Processor
P4
Processor
P5
Processor
P6
X X X
X X X
Global schedule: 1) Departure time 2) Route
Packet ≈
![Page 15: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/15.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 16 DTU Compute, Technical University of Denmark
Processor
P1
Processor
P2
Processor
P4
Processor
P4
Processor
P5
Processor
P6
X X X
X X X
Global schedule: 1) Departure time 2) Route
Packet ≈
![Page 16: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/16.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 17 DTU Compute, Technical University of Denmark
Processor
P1
Processor
P2
Processor
P4
Processor
P4
Processor
P5
Processor
P6
X X X
X X X
Global schedule: 1) Departure time 2) Route
Packet ≈
![Page 17: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/17.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 19 DTU Compute, Technical University of Denmark
• 3-flit package and 3-stage pipelined router: (TDM slot is 3 clock cycles)
• Variable length packets and arbitrary pipeline depth of router (TDM slot is 1 clock cycle)
TDM schedule + granularity
c4
TDM period = 6 slots = 18 clock cycles
c23 c4 H D D H D D H D D
1 2 3 4 5 6 1
c4
TDM period = 18 clocks cycles
c23 c4 H D D D H D D H D
1 2 3 4 9 5 6 7 8 10 11 12 13 14 15 16 17 18 1 2 3 4
![Page 18: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/18.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 20 DTU Compute, Technical University of Denmark
Traditional design: DMAs in processor nodes
![Page 19: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/19.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 21 DTU Compute, Technical University of Denmark
Traditional design: DMAs in processor nodes
• Data is copied (across CDC) from SPM to VC-buffer in NI = latency
• Physical buffer per virtual circuit = area • Flow control: VC-buffer vs. DMA
![Page 20: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/20.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 22 DTU Compute, Technical University of Denmark
Traditional design: DMAs in processor nodes
• NoC transfer data: - from VC-buffer in sender NI - to VC-buffer in receiver NI
• Credit based flow control = area
![Page 21: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/21.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 23 DTU Compute, Technical University of Denmark
Traditional design: DMAs in processor nodes
• Data is copied (across CDC) from NI to SPM = latency
• Physical buffer per virtual circuit • Flow control: buffer vs. DMA
![Page 22: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/22.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 24 DTU Compute, Technical University of Denmark
New design: DMAs in network interfaces
TDM TDM
![Page 23: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/23.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 25 DTU Compute, Technical University of Denmark
S
S
SM M
S
DMA tableWord countWrite ptr. Read ptr.
S
TDM counter
+1 +1
SPM
M
S S
Router
Outgoing packets Incoming packets
Data
AddressAddress Data
Flits Flits
To Processor (Data memory bus)
MUX DEMUX
To Processor (Data memory bus)
RouteSlot table
DMA indx
Dest. Addr.Route
– 1
Data
Header Payload
Dest. Addr.Route Data
PayloadHeader
Read Write
TRANSMIT
RECEIVE
![Page 24: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/24.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 26 DTU Compute, Technical University of Denmark
NI synthesized for Altera EP2C70 FPGA TDM period = 3 clock cycles.
Slot
![Page 25: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/25.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 27 DTU Compute, Technical University of Denmark
ASIC Results for a 16-node bi-torus NoC • 4 x 4 bi-torus (16 NIs and 16 routers) • All-to-all schedule. TDM-period = 23 slots = 69 clock cycles • Total of 16 x 15 = 240 virtual circuits • 65 nm CMOS • 1.5 x 1.5 mm2 tile size
![Page 26: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/26.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 28 DTU Compute, Technical University of Denmark
ASIC Results for a 16-node bi-torus NoC • 4 x 4 bi-torus (16 NIs and 16 routers) • All-to-all schedule. TDM-period = 23 slots = 69 clock cycles • Total of 16 x 15 = 240 virtual circuits • 65 nm CMOS • 1.5 x 1.5 mm2 tile size
![Page 27: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/27.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 29 DTU Compute, Technical University of Denmark
Results summary
• Relative area:
• Other routers and NI’s ? / easily 10+
Argo Other TDM NoC Router 1 1 (synchronous)
2 (mesochronous) NI 1 2-4
![Page 28: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/28.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 30 DTU Compute, Technical University of Denmark
Outline 1. Introduction 2. Background
– The T-CREST multi-core platform – Message passing, TDM and static scheduling
3. Architecture and implementation of Argo (globally synchronous) – Router – NI microarchitecture – Results (area)
4. Timing organization of Argo – Individually clocked processor cores – Mesochronous NIs – Network of asynchronous routers
5. Analysis of (clock and reset) skew tolerance 6. Generating schedules 7. Conclusion
![Page 29: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/29.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 32 DTU Compute, Technical University of Denmark
Timing organization? - mesochronous (same oscillator, some skew)
Core Core
Core
Network interface
Router
Link
![Page 30: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/30.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 33 DTU Compute, Technical University of Denmark
Mesochronous router - synchronous router + bi-synchronous FIFOs
HPU
HPU
HPU
. . .
. . .
. . .
. . .
X bar
Link
Link
Link
Link
Link
Link
Bi-synchronous FIFOs
Area= 1-2 x Router !!
![Page 31: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/31.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 34 DTU Compute, Technical University of Denmark
Timing organization? - Globally-Asynchronous Locally-Synchronous
Core Core
Core
Network interface
Router
Link
![Page 32: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/32.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 35 DTU Compute, Technical University of Denmark
1) Asynchronous routers and links
Request Acknowledge
Data
• Link:
![Page 33: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/33.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 36 DTU Compute, Technical University of Denmark
2) Mesochronous Network Interfaces • Mesochronous =
–A single oscillator –Bounded skew (possibly varying)
![Page 34: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/34.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 37 DTU Compute, Technical University of Denmark
3) Independently clocked IP coores • Different cores have
different “natural” speeds
• Frequency scaling • Voltage scaling
Core Core
![Page 35: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/35.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 38 DTU Compute, Technical University of Denmark
Clocking strategy -mesochronous (same oscillator, some skew)
Mesochronous NIs
Asynchronous links and routers
Individually clocked
processor cores
TDM TDM
![Page 36: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/36.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 39 DTU Compute, Technical University of Denmark
Timing Organization of ARGO
R
R R R
R R
NI
NI
CPU $ SPM
CPU $ SPM
Mesochronous NIs
Asynchronous routers
Individually clocked processors
NI
CPU $ SPM
NI
CPU $ SPM
NI
CPU $ SPM
NI
CPU $ SPM
![Page 37: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/37.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 40 DTU Compute, Technical University of Denmark
Asynchronous router for source-routed TDM-based NoC
HPU
HPU
HPU
. . .
. . .
. . .
X bar
Link
Link
Link
Link
Link
Join
Req Ack
Req Ack
CTL
Data (Phit) Data (Phit) Latch
en
Lt
Note: A token is a flit: - a word in a packet - a void
Fork
![Page 38: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/38.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 41 DTU Compute, Technical University of Denmark
Timing Organization of ARGO
R
R R R
R R
NI
NI
CPU $ SPM
CPU $ SPM
Mesochronous NIs
Asynchronous routers
Individually clocked processors
NI
CPU $ SPM
NI
CPU $ SPM
NI
CPU $ SPM
NI
CPU $ SPM
![Page 39: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/39.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 42 DTU Compute, Technical University of Denmark
Understanding the NoC
. . .
HPU
X bar
HPU
HPU
. . .
. . .
![Page 40: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/40.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 43 DTU Compute, Technical University of Denmark
Understanding the NoC
NI NI
NI IP IP
IP
Asynchronous
Mesochronous
Independently Clocked IPs
![Page 41: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/41.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 44 DTU Compute, Technical University of Denmark
Resetting the NoC
NI NI
NI IP IP
IP
Asynchronous
Mesochronous
Independently Clocked IPs
![Page 42: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/42.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 45 DTU Compute, Technical University of Denmark
Resetting the NoC
NI NI
NI IP IP
IP
Asynchronous
Mesochronous
Independently Clocked IPs
![Page 43: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/43.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 46 DTU Compute, Technical University of Denmark
Mesochronous-Asynchronous Interface
• Mesochronous NIs – Producer and consumer operate at
TDM clk : 𝑇𝑐𝑐𝑐 – Skew in reset and clock results in
phase shift (±δ ) among TDM schedules in NIs. Possibly more than 1 cycle!
• Timing assumption:
– Routers are faster than NIs 𝑇𝐻𝑎𝑎𝑎𝑎𝑎𝑎𝑐𝑎 < 𝑇𝑐𝑐𝑐
– Producer ignores ack consumer ignores req
no synchronization no metastability
NI1 NI2
clk clk’ rst rst’
Producer Consumer
req ack ack (NC)
req (NC)
![Page 44: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/44.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 47 DTU Compute, Technical University of Denmark
Mesochronous-Asynchronous Interface
• Mesochronous NIs – Producer and consumer operate at
TDM clk : 𝑇𝑐𝑐𝑐 – Skew in reset and clock results in
phase shift (±δ ) among TDM schedules in NIs. Possibly more than 1 cycle!
• Timing assumption:
– Routers are faster than NIs 𝑇𝐻𝑎𝑎𝑎𝑎𝑎𝑎𝑐𝑎 < 𝑇𝑐𝑐𝑐
– Producer ignores ack consumer ignores req
no synchronization no metastability
NI1 NI2
clk clk’ rst rst’
Producer Consumer
req ack ack (NC)
req (NC)
1 2 3
1 2 3
Clock skew
TDM skew
NI1
NI2
![Page 45: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/45.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 51 DTU Compute, Technical University of Denmark
Tolerating skew – How much? • Producer and consumer drive FIFO below its max. speed • Speed of a FIFO depends on how full it is • Skew tolerance = F(Structure, Lr, Lf, Tclk)
Max speed of FIFO (i.e., cycle time)
FIFO stages per token/phit
Skew (more full)
Skew (more empty)
Clk
Lf
Lr
Ideal=Reset state
![Page 46: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/46.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 52 DTU Compute, Technical University of Denmark
Performance Analysis of Concurrent Systems • System model
– Timed marked graph (a subclass of Petri nets) – Time Separation of Events (TSE)
• Two classes of algorithms: – Steady state average TSE – Worst case TSE (possibly an initial phase followed by oscillations)
• We have used the worst-case TSE algorithm of Hulgaard et al.: – H. Hulgaard, S. M. Burns, T. Amon, and G. Borriello. An algorithm
for exact bounds on the time separation of events in concurrent systems. IEEE Transactions on Computers, 44(11), Nov. 1995.
• Model 1: Detailed model of handshake latch implementation. • Model 2: Coarser model of handshake latch implementation:
– Analysis of complete 2 x 2 NoC (same results) – Study of possible oscillations of max TSE.
![Page 47: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/47.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 53 DTU Compute, Technical University of Denmark
Example of coarse model • Ring w. 4 handshake-latches and 2 tokens
• Time separation between successive tokens written into latch a: 4, 8, 4, 8, … average is 6
a
d c
b
c d
b a
3 1
3
3
3
1 1
1
Forward latency: Lf=1
Reverse latency: Lr=3
Circuit: Timed marked graph:
![Page 48: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/48.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 55 DTU Compute, Technical University of Denmark
Coarse-Grained Model of 2x2 NoC • Pipeline stage = node • Edges annotated with
Lf and Lr
• Case 1: Skew among neighbour nodes
– d1 – d2=d3=0
• Case 2: Skew among diagonal nodes
– d2 – d1=d3=0
![Page 49: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/49.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 56 DTU Compute, Technical University of Denmark
Skew – Period Results
• Neighboring nodes (case 1) is the worst-case skew tolerance
![Page 50: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/50.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 57 DTU Compute, Technical University of Denmark
Generating schedules • Metaheuristic scheduler
– Input: Core communication graph w. bandwidth requirements – Output: Schedule + (clock)frequency
• Scheduler works on normalized BW requirements
– Lowest BW requirement assigned 1 slot – Highest BW requirement assigned n slots (n can be large) – Schedules can be compressed by over-assigning BW to channels with
small BW requirements. For a range of benchmarks we found that compressing to 100 slot TDM periods is possible with negligible effects (i.e., need to increase clock frequency)
• NoC is effecticely drained for packets between schedule periods (except
for pipeline depth in shortest NI-to-NI path. – Perspectives for fast reconfiguration (support for mode changes)
![Page 51: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/51.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 58 DTU Compute, Technical University of Denmark
Conclusions • The Argo NoC combines TDM and GALS
– Asynchronous routers – Mesochronous NIs – Independently clocked processor cores
• Significant time-elasticity provided by network of asynchronous routers.
• Small and efficient implementation
– Avoids run time synchronization, arbitration and buffering – End-to-end, SPM-to-SPM transfer of data controlled by TDM schedule.
![Page 52: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/52.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 59 DTU Compute, Technical University of Denmark
References • Network interface:
– J. Sparsø, E. Kasapaki, and M. Schoeberl, “An area-efficient network interface for a TDM-based network-on-chip,” in Proc. Design, Automation and Test in Europe (DATE), 2013, pp. 1044–1047.
– R. B. Sørensen, L. Pezzarossa, and J. Sparsø, “An area-efficient TDM NoC supporting reconfiguration for mode changes,” in Proceedings of the International Symposium on Networks-on-Chip (NOCS). IEEE, 2016.
• Asynchronous router and timing analysis: – E. Kasapaki and J. Sparsø,“Argo: A Time-Elastic Time-Division-Multiplexed NoC
using Asynchronous Routers,” in Proc. IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC). IEEE Computer Society Press, 2014, pp. 45–52.
– E. Kasapaki and J. Sparsø, “The Argo NoC: Combining TDM and GALS”, in Proc. European Conference on Circuit Theory and Design (ECCTD), 2015, pp. 1-4.
• Scheduler – R. B. Sørensen, J. Sparsø, M. R. Pedersen, and J. Højgaard, “A Metaheuristic
Scheduler for Time Division Multiplexed Networks-on-Chip,” in Proc. IEEE/IFIP Workshop on Software Technologies for Future Embedded and Ubiquitous Systems (SEUS), 2014, pp. 309–316.
![Page 53: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/53.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 60 DTU Compute, Technical University of Denmark
References (cont.) • Journal articles with a broader perspective
– E. Kasapaki, M. Schoeberl, R. B. Sørensen, C. T. Müller, K. Goossens, and J. Sparsø, “Argo: A Real-Time Network-on-Chip Architecture with an Efficient GALS Implementation,” IEEE Transactions on VLSI Systems, vol. 24, no. 2, pp. 479–492, 2015.
– M. Schoeberl, S. Abbaspour, B. Akesson, N. Audsley, R. Capasso, J. Garside, K. Goossens, S. Goossens, S. Hansen, R. Heckmann, S. Hepp, B. Huber, A. Jordan, E. Kasapaki, J. Knoop, Y. Li, D. Prokesch, W. Puffitsch, P. Puschner, A. Rocha, C. Silva, J. Sparso, and A. Tocchi. “T-CREST: Time-predictable multi-core architecture for embedded systems,” Journal of Systems Architecture, 61(9):449-471, 2015.
• Open source: – Argo hardware: https://github.com/t-crest/argo – Scheduler: https://github.com/t-crest/poseidon
• Starting point for more information about the T-CREST plaform:
– http://patmos.compute.dtu.dk/
![Page 54: Argo - An area-efficient time-division-multiplexed network ... · Argo – An area-efficient time-division-multiplexed network-on-chip for real-time multicore platforms . Jens Sparsø](https://reader036.vdocument.in/reader036/viewer/2022071217/604d4932464e6c254c3ecc65/html5/thumbnails/54.jpg)
05/07/2016
Jens Sparsø / RTN 2016 Keynote 61 DTU Compute, Technical University of Denmark
END