track b: multicores & network on chip architectures/ oren hollander
DESCRIPTION
TRANSCRIPT
May 1, 2013 1
Trends & Design ConsiderationsChipEx 2013
Multicores & Network On Chip Architectures
ALL Rights Reserved
Oren HollanderFPGA & ARM Expert
May 1, 2013 2
What is SoC ?• On-chip integration of a variety of functional
hardware blocks to suit a specific product application– CPU/CPUs + Accelerators (GPU, VPU, IPU, etc.)– Small form factor– High volume of peripherals
• Blocks can operate at lower frequencies while delivering higher system-level performance and consuming much lower system-level power
ALL Rights Reserved
Enable rich features at reasonable computing speed and reasonable price points
May 1, 2013 3
SoC Trends• Apple acquired PA-Semi
– Enabling it to design its own application processors
• Qualcomm acquired Atheros – Strengthen its wireless connectivity suite and Summit
Technology for enhanced power management capability
• Nvidia acquired Icera– Strengthen its connectivity offering
• Intel acquired Infineon Wireless – Gain entry into the baseband connectivity market
ALL Rights Reserved
In just five years, the SoC technology has catapulted from enabling basic
computation/connectivity on a feature phone to being at the heart of all smartphones and
early stage ultrabooks, capable of a wide range of functions including audio/video, gaming, communication and productivity
May 1, 2013 4
ARM Connected Community – 800+
ALL Rights Reserved
May 1, 2013 5
SoC Examples
ALL Rights Reserved
Multimedia
i.MX 6Quad/6Dual
CPU Platform
System Control
Dual / Quad Cortex-A9
Security
Secure JTAG
PLL, Osc
Clock & Reset
NEONper core
Watch Dog x2
Timer x3
PWM x4
Internal Memory
ROM
RAM
Graphics: OpenGL/ES 2.x, OpenCL/EP, OpenVG 1.x
Smart DMA
1MB L2-cache + VFPv3
RNG
TrustZone
Security Ctrl
Secure RTC
32KB I-cacheper core
32KB D-cacheper core
Video Codecs: 1080p30
Connectivity
LP-DDR2,DDR3 / LV-DDR3 x32/64, 533 MHz
MMC 4.4 / SD 3.0 x3
MMC 4.4 / SDXC
UART x5, 5Mbps
I2C x3, SPI x5
ESAI, I2S/SSI x3
3.3V GPIO
USB2 OTG & PHYUSB2 Host & PHY
MIPI HSI
S/PDIF Tx/Rx
PCIe 2.0 (1-lane)
1Gb Ethernet+ IEEE1588
NAND Ctrl (BCH40)
USB2 HSIC Host x2
S-ATA & PHY 3GbpsPower Mgmt
Power Supplies
FlexCAN x2MLB150 + DTCP
eFuses
Ciphers
20-bit CSI
HDMI & PHY
MIPI DSI
LCD & Camera Interface
24-bit RGB, LVDS (x3-8)
MIPI CSI2
IOMUX
Temp Monitor
Audio: ASRC
PTMper core
Keypad
Resizing & BlendingInversion / RotationImage Enhancement
2x Imaging Processing Unit
May 1, 2013 6
What is NoC ?• NOC is a network of computational, storage and I/O
resources, interconnected by a network of switches– Connect processing cores and subsystems in
Multiprocessor System-on-Chips• One of the main component of NoC is a router which
is attached to a processing core (CPU or hardware accelerator) and tranfer messages from one NoCprocessing core to another core– Resources communicate with each other using addressed
data packets routed to their destination by the switch fabric
ALL Rights Reserved
May 1, 2013 7
Why do we need NoC ?• State-of-the-art SoC communication architectures start
facing scalability as well as modularity limitations– More advanced bus specifications are emerging to deal with
these issues at the expense of silicon area and complexity• Communication architecture evolutions mainly regard bus
protocols (to better exploit available bandwidth) and bus topologies (to increase bandwidth)– More aggressive solutions are needed to overcome the
scalability limitation• NoCs are currently viewed as a ‘revolutionary’ approach to
provide a scalable, high performance and robust infrastructure for on-chip communication
ALL Rights Reserved
May 1, 2013 8
NoC Example
ALL Rights Reserved
May 1, 2013 9
Multicore Challenges• Coherency between Multi-Cores• Coherency between Multi-Clusters• Homogeneous and Heterogeneous MP• Cluster booting• System interrupts• Tools issues (compiler & debugger)• Energy
ALL Rights Reserved
May 1, 2013 10
The ARM big.LITTLE Subsystem
High performance Cortex-A15 cluster
Energy efficient Cortex-A7cluster
CCI-400 provides cache coherency between clusters
Shared GIC-400 interrupt controller
Note: C-A7 is not required to have an L2 cache for coherency management
Cortex-A15 Cortex-A7
CCI-400
CPU 1CPU 0 CPU 0 CPU 1
I$ I$ I$ I$D$ D$ D$ D$
L2 Cache + SCU L2 Cache + SCU
GIC-400
Distributor interface
CPU 0Interface
CPU 1Interface
CPU 2Interface
CPU 3Interface
Cache coherent interconnect
Interrupts
ALL Rights Reserved
May 1, 2013 11
CCI-400 and System Coherency
• CCI-400 2+3 (x3)– 2 full AMBA 4 ACE slave
interfaces– +3 ACE-Lite I/O Coherent
Slave interfaces– +3 ACE-Lite master
interfaces
• CCI interfaces:– AMBA 4 ACE and ACE-
Lite manage all coherency and barriers
– Distributed Virtual Memory signaling for System MMU
ALL Rights Reserved
May 1, 2013 12
Heterogeneous Multi-Processing
• SMP OS runs across all CPUs, all clusters• Some CPUs may be taken offline to save power
– Possibly even all CPUs in a cluster• OS may support heterogeneous cluster configurations
– Scheduler potentially limits resource-sensitive threads to a specific cluster
SMP Operating System
C-A7 C-A7 C-A7 C-A7
Cluster 0
ThreadThread
ThreadThreadThread
ThreadThreadThread
ThreadThreadThread
Thread
C-A15 C-A15 C-A15 C-A15
Cluster 1
ThreadThread
ThreadThreadThread
ThreadThreadThread
ThreadThreadThread
Thread
ALL Rights Reserved
May 1, 2013 13
Principles of Task Migration
• System running on Cluster 0; Virtualizer decides more computational power is needed• Cluster 1 powered up• Threads migrated to Cluster 1 but Cluster 0 caches kept powered so they can still be
snooped• When the Cluster 0 caches have gone cold, remaining system state cleaned from Cluster 0,
Cluster 0 powered down
SMP Operating System
ThreadThread
ThreadThreadThread
ThreadThreadThread
ThreadThreadThread
Thread
C-A7 C-A7 C-A7 C-A7
Cluster 0C-A15 C-A15 C-A15 C-A15
Cluster 1
SMP Operating System
ThreadThread
ThreadThreadThread
ThreadThreadThread
ThreadThreadThread
Thread
Virtualizer
ALL Rights Reserved
May 1, 2013 14
Coherent multi-core• In MPCore systems a resource may be shared between threads
running on different CPUs within the cluster– The coherency logic connects Local Monitors in each of the CPUs in the cluster
Cortex-ALo
cal M
onito
r
Glo
bal M
onito
r
AXI I
nter
conn
ect
Memory
Cortex-A
Loca
l Mon
itor
Cohe
renc
y Lo
gic
Cortex-A MPCore
Thread 0
Thread1
ALL Rights Reserved
May 1, 2013 15
Summary
• Multicore, Multiprocessing, SoC and NoC are the current technologies
• There are many challenges and considerations while designing and programming MP system
• You have to acquire an architecture, tools, programming know how, in order to get the best trade-off between performance-power
ALL Rights Reserved
May 1, 2013 16ALL Rights Reserved