dr. john bainbridge, principal application architect, netspeed

23
May 9, 2016 Configurable, Coherent SoC Interconnect for Heterogeneous Multiprocessing and Storage Applications Dr. John Bainbridge ChipEx 2016 May 9 th , 2016

Upload: chiportal

Post on 09-Jan-2017

61 views

Category:

Business


0 download

TRANSCRIPT

Page 1: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016

Configurable, Coherent SoC Interconnect for Heterogeneous Multiprocessing and Storage ApplicationsDr. John Bainbridge

ChipEx 2016May 9th, 2016

Page 2: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 2

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

MARKET TRENDS AND CHALLENGES

Page 3: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 3

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Discontinuities in the Compute InfrastructureNew Demand Profile Driving New Architectures

MAIN FRAMES

1980s

Compute

CLIENT / SERVER

1990s

Desktop

DATA CENTERS

2000s

Internet

CLOUDCOMPUTING

Now

Software Defined Everything

General Purpose > Workload specific Hardware acceleration for SoftwareCustomized SKUs

Page 4: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 4

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Cloud Architectures Driving Heterogeneous ComputingIncreased Efficiency From Hardware Specialization

Source: ISSC Proceedings

Page 5: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 5

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Cloud Architectures Driving Heterogeneous ComputingReconfigurable Computing – ASIC + FPGA

Source: Hot Chips ’14 Proceedings

Page 6: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 6

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Discontinuities in Chip Design TechniquesChanging Abstraction Levels

CUSTOM

1980s

Sea of xtors

ASIC

1990s

Sea of Cells

SOC

2000s

Sea of Blocks

NEXT-GEN SOCS

Now

Sea of IPs

Time-to-Market Pressure

Performance Analysis from Day #1Create Differentiated Platforms

Page 7: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 7

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Evolution of Coherency SolutionsCoherency Participation Has Increased

CPU

Memory

L1L2

CPU

L1CPU

L1

MemoryL3

Memory

L2

CPUL1

CPUL1

L2

CPUL1

CPUL1

Last Level CacheMemory

L2

CPUL1

CPUL1

L2

CPUL1

CPUL1

GPU

Acce

lera

tor

Number of IP blocks have exploded and More agents are participating in coherency Requirements moving from homogeneous heterogeneous

Page 8: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 8

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

The Need for Flexible Coherency SolutionsFlexibility Creates Opportunities

APPLICATION Improve latency of

critical traffic Improve compute

cluster latency

FLOORPLAN Reduce interconnect

congestion Utilize unused die area

with additional cache capacity

POWER Partition solution to

enhance opportunities for power-gating

Lower BW requirement

Page 9: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 9

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

SOLUTION

Page 10: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 10

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

NetSpeed Technology OverviewBuilt From Requirements, Optimized For Specific Applications

PERFORMANCE, POWER,

AREAWORKLOAD

MODELSCOMPUTE ENGINES

NETSPEED IP AND SOC ARCHITECTURE

SYNTHESIS

CUSTOMIZED SOC PERFORMA

NCE OPTIMIZED

CORRECT-BY-

CONSTRUCTION

CACHECOHERENT

Page 11: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 11

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

NetSpeed Gemini OverviewScalable, Configurable Cache Coherent NoC

CONFIGURABLE SOLUTION

LATENCY OPTIMIZED SOLUTION

SCALABLESOLUTION

Page 12: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 12

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

TECHNOLOGY DETAILS

Page 13: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 13

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Configurable SolutionFloorplan Aware Design Flow

• Components, Floorplan• Coherency

Requirements• PPA, SoC Level Use

Cases

Step 1: Specify• Rapid Architecture

Exploration• Real-time Customization• Power Optimization

Step 2: Customize

Synthesizable RTL

Verification IP

Physical Design Collateral

IPXACT

Performance stats, Programmers Guide, etc.

• Customized SoC Interconnect IP

Step 3: Generate

ConfigurableLatency Optimized

Scalable

Page 14: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 14

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Configurable SolutionEnabling Cache Hierarchy Customization

Add last level caches to reduce critical latencies

Use multiple coherency controllers or caches to increase bandwidth

Improve cache utilization by controlling address ranges serviced by each cache

Support coherency across a mix of different slaves devices

ConfigurableLatency Optimized

Scalable

Page 15: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 15

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Correct-by-Construction Formal Analysis to Build Deadlock-free NoC

Formally ProvenFormal techniques and graph theory algorithms

Correct-by-constructionUser-driven traffic dependencies

RobustHandles complex topologies and routing

ConfigurableLatency Optimized

Scalable

Page 16: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 16

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Latency Optimized SolutionTraffic Based Rapid Reconfiguration

Fully heterogeneous in determining channel & buffer sizes

Ring

TreeHeterogeneous

Mesh

NetSpeed RouterNetSpeed Auto-generated Links

ConfigurableLatency Optimized

Scalable

Page 17: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 17

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Latency Optimized SolutionPhysically Distributed Coherency

GPULast Level

Cache

CPU CPU CPU

CPU

Last LevelCache

CCC

CCC

CCC

DSP

Lower latency by placing coherency controllers and caches where they are accessed the most

Reduce congestion by handling requests locally and using caches to reduce traffic to memory

Adjust cache hierarchy to support floorplan requirements

Improve die utilization by placing caches in empty die space

ConfigurableLatency Optimized

Scalable

Page 18: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 18

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Latency Optimized SolutionPhysically Distributed Coherency

FastPath – Dedicated connections for latency sensitive traffic to reduce arbitration, congestion

MultiLayer – Isolate traffic with different performance requirements

QoS configurability for bandwidth allocation and prioritization

ACE Master

#0

ACE Master

#1

AC E AC E

FastPathTM Bridge

FastPathTM Bridge

AC E

CCC

NoC

Master Bridge

AC E

Master Bridge

FastPathTM Reducing CPU latency

ConfigurableLatency Optimized

Scalable

Page 19: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 19

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Scalable SolutionScalable Coherency Bandwidth

ConfigurableLatency Optimized

Scalable

More coherent lookups per cycle– Increased coherency

bandwidth through address-sliced coherency controllers

– Automated determination of coherency controllers

Page 20: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 20

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Scalable SolutionImportance of Filtering Snoops

ConfigurableLatency Optimized

Scalable

Scalability requires significant filtering of snoops Directory structure delivers efficient solution in lesser area

1.0

2.0

1.5

0.5

0.0

3.0

2.5

90%

85%Filter Rate (%)

Size

of D

irect

ory

/ Sno

op Fi

lter (

MB)

100%

95%

50%

80%

70%

75%

65%

Page 21: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 21

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Scalable SolutionDirectory Scaling

ConfigurableLatency Optimized

Scalable

NETSPEED SOLUTION:– Superior directory design– Linear growth vs. O(n2) growth in directory

Directory size grows O(n2)– Number of Entries & Number of Agents

TAGNumber

ofEntries

DIRECTORY

OwnershipVector

Number of

Agents

Page 22: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 22

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Scalable SolutionReducing Conflicts

ConfigurableLatency Optimized

Scalable

Power

Performance

ONE-TO-ONE DIRECTORY– Directory = sum(cache associativity) – Dynamic power with traffic &

number of agents– O(n2)

GLOBAL SHARED DIRECTORY– Address conflicts reduce performance– Limited associativity reduces power– Conflict rate increases with cache

associativity

NETSPEED SOLUTION:– Limits associativity and dynamic power– Advanced techniques to avoid address conflicts

Page 23: Dr. John Bainbridge, Principal Application Architect, NetSpeed

May 9, 2016 23

John Bainbridge, ChipEx 2016 | © Copyright 2016 NetSpeed Systems

Summary

PROBLEM

New architectures driving need for heterogeneous computing

Exploding complexity with shrinking timelines

Need configurable coherency solutions

SOLUTION NetSpeed Architecture Synthesis Platform NetSpeed Gemini: Configurable and

Scalable Coherent NoC IP

BENEFITS Flexible, correct-by-construction IP Higher performance with lower latency Build customized application-specific

solution