miller.on chip optical communications
TRANSCRIPT
-
8/8/2019 Miller.on Chip Optical Communications
1/24
On-Chip OpticalCommunication for Multicore
Processors
Jason Miller
Carbon Research Group
MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCELAB
-
8/8/2019 Miller.on Chip Optical Communications
2/24
-
8/8/2019 Miller.on Chip Optical Communications
3/24
-
8/8/2019 Miller.on Chip Optical Communications
4/24
4
Number of cores doublesevery 18 months
The Future of Multicore
Parallelism replacesclock frequencyscaling and corecomplexity
ResultingChallenges
ScalabilityProgrammingPower
MIT RAW Sun Ultrasparc T2 IBM XCell 8i
Tilera TILE64
-
8/8/2019 Miller.on Chip Optical Communications
5/24
5
Multicore Challenges
Scalability How do we turn additional cores into additional performance?
Must accelerate single apps, not just run more apps in parallel Efficient core-to-core communication is crucial
Architectures that grow easily with each new technologygeneration
Programming Traditional parallel programming techniques are hard Parallel machines were rare and used only by rocket scientists Multicores are ubiquitous and must be programmable by
anyone
Power Already a first-order design constraint More cores and more communication more power Previous tricks (e.g. lower Vdd) are running out of steam
M l i C i i
-
8/8/2019 Miller.on Chip Optical Communications
6/24
6
Multicore CommunicationToday
Single shared resource
Uniform communication costCommunication through
memory
Doesnt scale to many cores
due to contention and longwires
Scalable up to about 8 cores
BUS
p p
c c
L2 Cache
DRAM
Bus-based Interconnect
-
8/8/2019 Miller.on Chip Optical Communications
7/24
-
8/8/2019 Miller.on Chip Optical Communications
8/24
8
Multicore Programming Trends
Meshes and small cores solve the physical scalingchallenge, but programming remains a barrier
Parallelizing applications to thousands of cores is hard
Task and data partitioning
Communication becomes critical as latencies increase
Increasing contention for distant communication
Degraded performance, higher energy
Inefficient broadcast-style communication
Major source of contention Expensive to distribute signal electrically
-
8/8/2019 Miller.on Chip Optical Communications
9/24
9
Multicore Programming Trends
For high performance, communication andlocality must be managed
Tasks and data must be both partitioned and
placed Analyze communication patterns to minimize latencies
Place data near the code that needs it most
Place certain code near critical resources (e.g. DRAM, I/O)
Dynamic, unpredictable communication isimpossible to optimize
Orchestrating communication and localityincreases programming difficulty exponentially
-
8/8/2019 Miller.on Chip Optical Communications
10/24
10
Improving Programmability
Observations:
A cheap broadcast communication mechanism
can make programming easier Enables convenient programming models (e.g., shared
memory)
Reduces the need to carefully manage locality
On-chip optical components enable cheap,energy-efficient broadcast
-
8/8/2019 Miller.on Chip Optical Communications
11/24
-
8/8/2019 Miller.on Chip Optical Communications
12/24
12
Optical Broadcast Network
Waveguide passesthrough every core
Multiplewavelengths (WDM)
eliminatescontention
Signal reaches allcores in
-
8/8/2019 Miller.on Chip Optical Communications
13/24
-
8/8/2019 Miller.on Chip Optical Communications
14/24
14
Optical bit transmission
sending core receiving core
flip-flop flip-flop
filter
photodetector
modulator
modulator
driver
data waveguide
transimpedanceamplifier
multi-wavelength source waveguide
Each core sends data using a different wavelength nocontention
Data is sent once, any or all cores can receive it efficientbroadcast
-
8/8/2019 Miller.on Chip Optical Communications
15/24
15
Core-to-core communication
32-bit data words transmitted across several parallel waveguides
Each core contains receive filters and a FIFO buffer for everysender
Data is buffered at receiver until needed by the processing core
Receiver can screen data by sender (i.e. wavelength) or messagetype
sending coreA
receiving coresending core B
FIFO
32
ProcessorCore
FIF
OFIFO
32
ProcessorCore
FIF
O
FIF
O
FIF
O
Processor Core
32 32
-
8/8/2019 Miller.on Chip Optical Communications
16/24
16
ATAC Bandwidth
64 cores, 32 lines, 1 Gb/s
Transmit BW: 64 cores x 1 Gb/s x 32 lines = 2 Tb/s
Receive-Weighted BW: 2 Tb/s * 63 receivers= 126 Tb/s
Good metric for broadcast networks reflects WDM
ATAC allows better utilization of computational
resources because less time is spent performingcommunication
S t C biliti d
-
8/8/2019 Miller.on Chip Optical Communications
17/24
17
System Capabilities andPerformance
Baseline: Raw Multicore Chip Leading-edge tiled multicore
64-core system (65nm process) Peak performance: 64 GOPS Chip power: 24 W Theoretical power eff.: 2.7
GOPS/W Effective performance: 7.3 GOPS Effective power eff: 0.3
GOPS/W Total system power: 150 W
ATAC Multicore Chip Future optical interconnect
multicore
64-core system (65nm process) Peak performance: 64 GOPS
Chip power: 25.5 W Theoretical power eff.: 2.5
GOPS/W Effective performance: 38.0
GOPS Effective power eff.: 1.5
GOPS/W
Total system power: 153 WOptical communications require a smallamount of additional system power but
allow for much better utilization ofcomputational resources.
-
8/8/2019 Miller.on Chip Optical Communications
18/24
18
Programming ATAC
Cores can directly communicate with anyother corein one hop (
-
8/8/2019 Miller.on Chip Optical Communications
19/24
19
Communication-centric Computing
BUS
p p
c c
L2 Cache
ATAC reduces off-chip memory calls, and hence energy and
latency
View of extended global memory can be enabled cheaplywith on-chip distributed cache memory and ATAC network
ATAC
memory
Bus-Based
Multicore
3pJ
3pJ
3pJ
3pJ
500pJ
500pJ
500pJ
500pJ
Operation Energy Latency
Networktransfer
3pJ 3 cycles
ALU addoperation
2pJ 1 cycle
32KB cacheread 50pJ 1 cycle
Off-chipmemoryread
500pJ 250cycles
-
8/8/2019 Miller.on Chip Optical Communications
20/24
20
Summary
ATAC uses optical networks to enable multicoreprogramming and performance scaling
ATAC encourages communication-centric architecture,which helps multicore performance and power scalability
ATAC simplifies programming with a contention-free all-to-all broadcast network
ATAC is enabled by recent advances in CMOS integrationof optical components
-
8/8/2019 Miller.on Chip Optical Communications
21/24
Backup Slides
What Does the Future Look
-
8/8/2019 Miller.on Chip Optical Communications
22/24
22
What Does the Future LookLike?
Corollary of Moores law: Number of cores willdouble every 18 months
05 08 11 14
64 256 1024 4096
02
16Research
Industry 16 64 256 10244
(Cores minimally big enough to run a self respecting
1K cores by 2014! Are we ready?
-
8/8/2019 Miller.on Chip Optical Communications
23/24
23
Scaling to 1000 Cores
Purely optical design scales to about 64 cores
After that, clusters of cores share optical hubs ENet and BNet move data to/from optical hub
Dedicated, special-purpose electrical networks
Proc
Dir$
$
memory
memory
64 Optically-Connected ClustersElectrical Networks Connect
16 Cores to Optical Hub
ONet
BNet
ENet
HUB
NET
-
8/8/2019 Miller.on Chip Optical Communications
24/24
24
ATAC is an Efficient Network
Modulators are Primary Source of Power Consumption Receive Power: Require only ~2 fJ/bit even with -5dB link loss
Modulator Power:
Ge-Si EA design ~75 fJ/bit (assume 50 fJ/bit for modulator driver)
Example: 64-Core Communication
(i.e. N = 64 cores = 64 s; for 32 bit word: 2048 drops/core and 32 adds/core) Receive Power: 2 fJ/bit x 1Gbit/s x 32 bits x N2 = 262 W
Modulator Power: 75 fJ/bit x 1Gbit/s x 32 bits x N = 153 W
Total energy/bit = 75 fJ/bit + 2 fJ/bit x (N-1) = 201 fJ/bit
Comparison: Electrical Broadcast Across 64 Cores Require 64 x 150fJ/bit = 10 pJ/bit (~50X more power)
(Assumes 150fJ/mm/bit, 1-mm spaced tiles)