system ip for 64bit systems - armtechforum.com.cn · trace off-chip trace self-hosted trace ......
TRANSCRIPT
1
System IP for 64-bit Systems ARM® CoreLink™ and CoreSight™ System IP
ARM Tech Symposia
November, 2014
3
The Mobile Consumer Expects Something New Every Year
Your next
flagship devices
2011 2012
2013
2014 Dual-core CPU
performance
>5.0 inch
1080p 60fps
screens >13 MPixel
camera
ARMv8-A and
shift to 64-bit
4
Why 64-bit in Mobile?
Performance through
architecture
Cleaner instruction set architecture
Hard-float ABI by default in ARMv8-A
More registers, less stack spillage
Cheaper function calls
Up to 16x crypto acceleration
Preparation for larger memory devices
5
Increasing Demand for System Bandwidth
2009 2011 2013 2015
50
40
20
10
>20 Mpixel cameras
and 4K output
Capture and screen
frame rates
Screen sizes and
resolutions
30
Year of device shipping
Peak
on-c
hip
sy
stem
ban
dw
idth
(G
B/s
)
6
Designing Within an Energy and Thermal Envelope
High-end feature rich gaming
Video editing on the move
SoC mobile power envelope
2.5 - 3W 4 - 5W 7W
7
Mobile users spend a high amount of time on a
range of mobile applications*:
38% on web browsing and Facebook
32% on gaming
16% on audio, video and utility
Common “building blocks” in workloads:
Short bursts of high intensity
Long periods of sustained high intensity
Low intensity
Mobile Application Workloads
Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform
* Source: Flurry Analytics Time
Time
Time
Pow
er
Pow
er
Pow
er
Web Browsing
Gaming
Audio Playback
8
Heterogeneous Computing
2x higher performance vs. SMP*
Up to 75% CPU power savings vs. SMP*
Architecturally Identical Processors
High performance tuned big cores
Low power tuned LITTLE cores
Hardware Coherency
Cache Coherent Interconnect (CCI)
L1 and L2 snooping between clusters
Seamless & Automatic Task Allocation
big.LITTLE Technology
“Right Task on the Right Core”
L2 Cache L2 Cache
Cache Coherent Interconnect
Interrupt Control
Up to 40% SOC power savings**
** Measured across a set of casual games and common use-cases on an ARM
Partner 4xCortex-A15.4xCortex-A7 big.LITTLE device
big Cluster
LITTLE Cluster
* Quad Cortex-A15 Symmetrical Multiprocessing System (SMP)
9
big.LITTLE is Mainstream
Cortex-A15/A7 big.LITTLE in product in 2014
Mediatek MT8135 , Samsung Exynos 5422 , Allwinner A80
High-end mobile moving to A57 and A53 big.LITTLE
Benefits for additional high-end performance, 64-bit
Silicon expected in late 2014 - e.g. Qualcomm Snapdragon 810,
Exynos 7 Octa
Global Task Scheduling is now a differentiation point
HMP access to all cores
10
CoreLink CCI-400 System Coherency for 64-bit big.LITTLE
First of a generation supporting multi-cluster coherency.
We are actively working on the next generation CCI.
Quad Cortex-A57
L2
Mali-GPU
CoreLink™ CCI-400
Cache Coherent Interconnect
with AMBA® 4 ACE™
System and I/O DDR
L2
Quad Cortex-A53
High performance hub interconnect for smart
phone and beyond
2 CPU clusters, 8 core GPU, DMC
Performance and power efficiency with big.LITTLE
Supporting Cortex-A53 and Cortex-A57
Integrated clock gating
System level hardware coherency
Full coherency for CPU
I/O coherency for GPU
Mature and silicon proven, over 30 licensees
11
Common memory
view for all SoC components
Unified interrupts for
complex processors
64-bit Mobile Sub-System Example
Software Debug, Hardware
Performance Trace
Hardware coherency enables
big.LITTLE and simplifies software
Optimized path to
memory for best performance
Configurable interconnect enables
flexible system design
GIC-500I/O Coherent
Masters
Cortex-A57 Cortex-A53
Peripherals
MMU-500
MMU-500
DRAM
NIC-400
Mali T760
GPU
CCI-400
TZC-400
Mali
V500Display
NIC-400
Memory System
DMC-400 3rd
Party: LPDDR3/4
ET
MS
TM
12
GIC-500I/O Coherent
Masters
Cortex-A57 Cortex-A53
Peripherals
MMU-500
MMU-500
DRAM
NIC-400
Mali T760
GPU
CCI-400
TZC-400
Mali
V500Display
NIC-400
Memory System
DMC-400 3rd
Party: LPDDR3/4
64-bit Mobile Sub-System – Debug and Trace
Run-control
debug
Real-time
trace
Off-chip
trace
Self-
hosted
trace
Debugger
access to
peripherals Debugger access
to memory
System &
Software
Trace
Cross
communication
of events
Event & trace
correlation
Performance
Analysis
ETM ETM
PMU PMU
TPIU CTM
timestamp
TMC
STM
14
STB
Wide range of network performance and intelligence behaviors
Content in the Cloud Drives Intelligence in the Network
Content moving closer to user for better performance
The Cloud / Data Center
Rendering moving into network
for greater UI possibilities
Display Clients
15
Networking and Datacenter Infrastructure requires solving diverse problems…
Heterogeneous platforms for diverse environments
Data center to shopping center!
Power efficiency and elasticity are always important
Evolving compute problems
Demanding performance/efficiency requirements
Different cores for different problems
Common SW Framework on heterogeneous compute platforms
Infrastructure Compute Challenges
16
Heterogeneous Compute Requirements
Specialised Processing
L1, Content Delivery, Security
Diverse requirements
Trend: Advanced modulation schemes
Need: DSPs, Accelerators
Data Plane Processing
Throughput driven, IO intensive
Deterministic performance
Trend: Higher packet rates
Need: Small Cores at Maximum Efficiency
Control Plane Processing
Fast Event Processing
Complex signalling
Trend: Evolving Software
Need: Efficient, High Compute Performance
MAC Scheduling
Real Time, Latency Driven
Multiple core processing
Trend: More Complexity (LTE-A, 5G)
Need: High Compute, Low Latency Performance
High Bandwidth, Low Latency Interconnect
Wide Range of Implementations from Few to Many Coherent Devices
17
Scalable Efficient Interconnect for Compelling Solutions
CCN-508
Syst
em
Perf
orm
ance
High-end Mid-range Cost-efficient
System Size
CCN-504
CCN-502
CCI-400
CCN-512
Level-3 Cache Size 0MB 32MB
DDR Bandwidth 20 GB/s 100 GB/s
On-chip bandwidth 0.2 Tb/s 1.8 Tb/s
AMBA 5 CHI
AMBA 4 ACE
18
Extending the ARM CoreLink™ Cache Coherent Network Family
• 2 new members extend the scalability of the CCN family
• Native AMBA 5 CHI interfaces providing high frequency, non-blocking data transfers
• End-to-end QoS and RAS
• Integrated Level 3 Cache and Snoop Filter
Up to 4 Clusters (16 cores)
Small to Mid-Range Systems
CoreLink CCN-512 Maximize Compute Density
Up to 12 Clusters (48 cores)
High-End Systems
CoreLink CCN-502 High Performance, Small Footprint
DSPDSP
ACE
Network Interconnect
NIC-400
Flash
NIC-400
USB
Memory
Controller
DMC-520
x72
DDR4-3200
AHB
Snoop Filter1-32MB L3 cache
PCIe
10-40
GbE
DPI Crypto
CoreLink™ CCN-512 Cache Coherent Network
DSP SATA
Memory
Controller
DMC-520
x72
DDR4-3200
Cortex-A57
Memory
Controller
DMC-520
x72
DDR4-3200
Memory
Controller
DMC-520
x72
DDR4-3200
PCIe
DPI
I/O Virtualisation CoreLink MMU-500
SRAM
Network Interconnect
NIC-400
GPIO PCIe
GIC-500
Cortex CPU
or CHI
master
Cortex-A53
Cortex-A57
Cortex-A53
Cortex-A57
Cortex-A53
Cortex-A57
Cortex-A53
Cortex CPU
or CHI
master
Cortex CPU
or CHI
master
Cortex CPU
or CHI
master
DSPDSP
NIC-400
USB
Snoop Filter0-8MB L3 cache
PCIe
10-40
GbE
CoreLink™ CCN-502 Cache Coherent Network
DSP SATA
Memory
Controller
DMC-520
x72
DDR4-3200
Memory
Controller
DMC-520
x72
DDR4-3200
I/O Virtualisation CoreLink MMU-500
Network Interconnect
NIC-400
Flash SRAM GPIO PCIe
GIC-500
Memory
Controller
DMC-520
x72
DDR4-3200
Memory
Controller
DMC-520
x72
DDR4-3200
Cortex-A53 Cortex-A53Cortex-A57 Cortex-A57
19
Scalable Platform for Diverse Processing Needs
Cortex-A7
Cortex-A53
CCI-400
CCN-502
Cost-Efficient Power-Optimized
CCN-502
CCN-504
Cortex-A53
Cortex-A57
Mid-range Performance
CCN-508
CCN-512
Cortex-A53
Cortex-A57
High Performance Networking
and Server
20
Efficient Hardware-Assisted Virtualization
Direct hardware access with MMU-500
Low latency interrupt delivery with GIC-500
Support for on-chip or off-chip peripherals
21
Software Debug, Hardware Debug
And System Profiling
Configurable interconnect enables
flexible system design
Common memory
view for all SoC components
Unified interrupts for
complex processors
Optimized path to
memory for best performance
Hardware coherency enables
scaling and simplifies software
64-bit Infrastructure System Example
DSPDSP
NIC-400
USB
Snoop Filter0-8MB L3 cache
PCIe
10-40
GbE
CoreLink™ CCN-502 Cache Coherent Network
DSP SATA
Cortex-A57
Memory
Controller
DMC-520
x72
DDR4-2400
Memory
Controller
DMC-520
x72
DDR4-2400
I/O Virtualisation CoreLink MMU-500
NIC-400
Flash SRAM GPIO PCIe
GIC-500
Memory
Controller
DMC-520
x72
DDR4-3200
Memory
Controller
DMC-520
x72
DDR4-3200
Cortex-A53 Cortex-A53Cortex-A57
PT
MS
TM
23
Mobile Picking Up the Pace and Reach
Need for more performance in a constrained
thermal envelope
Premium mobile expects something new
every year
ARMv8-A and
shift to 64-bit
24
NFV
Cloud
RAN
Equipment
Base Station
Controller
Optical Core
Networking
Equipment
B-RAS
Evolved Packet Core
SGSN
Storage Array Network
Controller
Edge Server
Cellular Macro Cell
Base Stations
Core Server
Email Web
HPC Scientific Compute
GGSN
Cellular Small Cell
Base Stations
DSLAM
Microwave
Backhaul
Optical Line
Termination Mobile Broadband
Access and
Aggregation
Edge
Router
Core Router
Media content web
Scale out storage
SDN
Cable Modem
DSL Modem
Home Gateway Set Top Box
Femto BTS
Optical Network
Termination
Wi-Fi
Access Point
Cellular Remote
Radio Head/Antenna CDN
Cloud
Scalable Platforms for Diverse Processing Needs
XaaS Cloud
CDN
26
64-bit applications support
enabled
100% compatibility
for 32-bit applications
Interconnect, interrupts,
virtualization,
debug and trace
Juno – The First ARMv8-A 64-bit Software Development Target
64 and 32-bit
Software
System IP
ARMv8-A
Juno
Premium ARMv8-A
software target
platform
Available now