dr. john bainbridge, principal application architect, netspeed
TRANSCRIPT
May 9, 2016
Configurable, Coherent SoC Interconnect for Heterogeneous Multiprocessing and Storage ApplicationsDr. John Bainbridge
ChipEx 2016May 9th, 2016
May 9, 2016 2
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
MARKET TRENDS AND CHALLENGES
May 9, 2016 3
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Discontinuities in the Compute InfrastructureNew Demand Profile Driving New Architectures
MAIN FRAMES
1980s
Compute
CLIENT / SERVER
1990s
Desktop
DATA CENTERS
2000s
Internet
CLOUDCOMPUTING
Now
Software Defined Everything
General Purpose > Workload specific Hardware acceleration for SoftwareCustomized SKUs
May 9, 2016 4
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Cloud Architectures Driving Heterogeneous ComputingIncreased Efficiency From Hardware Specialization
Source: ISSC Proceedings
May 9, 2016 5
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Cloud Architectures Driving Heterogeneous ComputingReconfigurable Computing – ASIC + FPGA
Source: Hot Chips ’14 Proceedings
May 9, 2016 6
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Discontinuities in Chip Design TechniquesChanging Abstraction Levels
CUSTOM
1980s
Sea of xtors
ASIC
1990s
Sea of Cells
SOC
2000s
Sea of Blocks
NEXT-GEN SOCS
Now
Sea of IPs
Time-to-Market Pressure
Performance Analysis from Day #1Create Differentiated Platforms
May 9, 2016 7
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Evolution of Coherency SolutionsCoherency Participation Has Increased
CPU
Memory
L1L2
CPU
L1CPU
L1
MemoryL3
Memory
L2
CPUL1
CPUL1
L2
CPUL1
CPUL1
Last Level CacheMemory
L2
CPUL1
CPUL1
L2
CPUL1
CPUL1
GPU
Acce
lera
tor
Number of IP blocks have exploded and More agents are participating in coherency Requirements moving from homogeneous heterogeneous
May 9, 2016 8
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
The Need for Flexible Coherency SolutionsFlexibility Creates Opportunities
APPLICATION Improve latency of
critical traffic Improve compute
cluster latency
FLOORPLAN Reduce interconnect
congestion Utilize unused die area
with additional cache capacity
POWER Partition solution to
enhance opportunities for power-gating
Lower BW requirement
May 9, 2016 9
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
SOLUTION
May 9, 2016 10
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
NetSpeed Technology OverviewBuilt From Requirements, Optimized For Specific Applications
PERFORMANCE, POWER,
AREAWORKLOAD
MODELSCOMPUTE ENGINES
NETSPEED IP AND SOC ARCHITECTURE
SYNTHESIS
CUSTOMIZED SOC PERFORMA
NCE OPTIMIZED
CORRECT-BY-
CONSTRUCTION
CACHECOHERENT
May 9, 2016 11
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
NetSpeed Gemini OverviewScalable, Configurable Cache Coherent NoC
CONFIGURABLE SOLUTION
LATENCY OPTIMIZED SOLUTION
SCALABLESOLUTION
May 9, 2016 12
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
TECHNOLOGY DETAILS
May 9, 2016 13
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Configurable SolutionFloorplan Aware Design Flow
• Components, Floorplan• Coherency
Requirements• PPA, SoC Level Use
Cases
Step 1: Specify• Rapid Architecture
Exploration• Real-time Customization• Power Optimization
Step 2: Customize
Synthesizable RTL
Verification IP
Physical Design Collateral
IPXACT
Performance stats, Programmers Guide, etc.
• Customized SoC Interconnect IP
Step 3: Generate
ConfigurableLatency Optimized
Scalable
May 9, 2016 14
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Configurable SolutionEnabling Cache Hierarchy Customization
Add last level caches to reduce critical latencies
Use multiple coherency controllers or caches to increase bandwidth
Improve cache utilization by controlling address ranges serviced by each cache
Support coherency across a mix of different slaves devices
ConfigurableLatency Optimized
Scalable
May 9, 2016 15
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Correct-by-Construction Formal Analysis to Build Deadlock-free NoC
Formally ProvenFormal techniques and graph theory algorithms
Correct-by-constructionUser-driven traffic dependencies
RobustHandles complex topologies and routing
ConfigurableLatency Optimized
Scalable
May 9, 2016 16
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Latency Optimized SolutionTraffic Based Rapid Reconfiguration
Fully heterogeneous in determining channel & buffer sizes
Ring
TreeHeterogeneous
Mesh
NetSpeed RouterNetSpeed Auto-generated Links
ConfigurableLatency Optimized
Scalable
May 9, 2016 17
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Latency Optimized SolutionPhysically Distributed Coherency
GPULast Level
Cache
CPU CPU CPU
CPU
Last LevelCache
CCC
CCC
CCC
DSP
Lower latency by placing coherency controllers and caches where they are accessed the most
Reduce congestion by handling requests locally and using caches to reduce traffic to memory
Adjust cache hierarchy to support floorplan requirements
Improve die utilization by placing caches in empty die space
ConfigurableLatency Optimized
Scalable
May 9, 2016 18
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Latency Optimized SolutionPhysically Distributed Coherency
FastPath – Dedicated connections for latency sensitive traffic to reduce arbitration, congestion
MultiLayer – Isolate traffic with different performance requirements
QoS configurability for bandwidth allocation and prioritization
ACE Master
#0
ACE Master
#1
AC E AC E
FastPathTM Bridge
FastPathTM Bridge
AC E
CCC
NoC
Master Bridge
AC E
Master Bridge
FastPathTM Reducing CPU latency
ConfigurableLatency Optimized
Scalable
May 9, 2016 19
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Scalable SolutionScalable Coherency Bandwidth
ConfigurableLatency Optimized
Scalable
More coherent lookups per cycle– Increased coherency
bandwidth through address-sliced coherency controllers
– Automated determination of coherency controllers
May 9, 2016 20
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Scalable SolutionImportance of Filtering Snoops
ConfigurableLatency Optimized
Scalable
Scalability requires significant filtering of snoops Directory structure delivers efficient solution in lesser area
1.0
2.0
1.5
0.5
0.0
3.0
2.5
90%
85%Filter Rate (%)
Size
of D
irect
ory
/ Sno
op Fi
lter (
MB)
100%
95%
50%
80%
70%
75%
65%
May 9, 2016 21
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Scalable SolutionDirectory Scaling
ConfigurableLatency Optimized
Scalable
NETSPEED SOLUTION:– Superior directory design– Linear growth vs. O(n2) growth in directory
Directory size grows O(n2)– Number of Entries & Number of Agents
TAGNumber
ofEntries
DIRECTORY
OwnershipVector
Number of
Agents
May 9, 2016 22
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Scalable SolutionReducing Conflicts
ConfigurableLatency Optimized
Scalable
Power
Performance
ONE-TO-ONE DIRECTORY– Directory = sum(cache associativity) – Dynamic power with traffic &
number of agents– O(n2)
GLOBAL SHARED DIRECTORY– Address conflicts reduce performance– Limited associativity reduces power– Conflict rate increases with cache
associativity
NETSPEED SOLUTION:– Limits associativity and dynamic power– Advanced techniques to avoid address conflicts
May 9, 2016 23
John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems
Summary
PROBLEM
New architectures driving need for heterogeneous computing
Exploding complexity with shrinking timelines
Need configurable coherency solutions
SOLUTION NetSpeed Architecture Synthesis Platform NetSpeed Gemini: Configurable and
Scalable Coherent NoC IP
BENEFITS Flexible, correct-by-construction IP Higher performance with lower latency Build customized application-specific
solution