a 1024-core 70gflop/w floating point manycore microprocessor · a 1024-core 70gflop/w floating...

1
A 1024-core 70GFLOP/W Floating Point Manycore Microprocessor Andreas Olofsson, Roman Trogan, Oleg Raikhman Adapteva, Lexington, MA ANSI C Programmable IEEE Floating Point 70 GFLOPS/W E1G3 E16G3 E1G4 E16G4 E64G4 E256G4 E1KG4 E4KG4 Cores 1 16 1 16 64 256 1024 4096 Process Geometry 65G 65G 28LP 28LP 28LP 28LP 28LP 28LP Max Frequency (MHz) 1000 1000 700 700 700 700 700 700 Performance (GFLOPS/sec) 2 32 1.4 22 88 352 1408 5632 Performance (CoreMark) 1288 1288*16 900 900*16 900*64 900*256 900*1k 900*4k Peak Energy Efficiency (GFLOPS/W) 35 35 70 70 70 70 70 70 Total Area (mm 2 ) 0.5 8.96 0.13 2.05 8.2 32.7 131.1 524.3 20-100W 0 10 20 30 40 50 60 70 80 0 5 10 15 20 25 30 GFLOPS/W GFLOPS Energy Efficiency ENERGY EFFICIENCY ENERGY EFFICIENCY (28nm) Epiphany Introduction Accelerator Model A More Balanced Approach? 64 CORES @ 2 WATTS 28nm IP Offering Programming Model Silicon Measurements Built to Scale 1024 Cores 1 Core Architecture designed from scratch to scale LEGO approach to chip design Array generator minimizes chip design 1024 cores easily manufactured in 28nm 100% C/C++ Programmable Architecture is programming model agnostic Could support SIMD, MIMD, Threading, Message Passing, Functional Programming. Focusing on message passing model Benchmarks CoreMark C-Benchmark Matrix Multiplication (Assembly90%) Software Development Kit 64 Cores 800 MHz 102 GFLOPS 70 GFLOP/W 3mm x 4mm <2 Watt Chip Power 28nm Process Available Q1,2012

Upload: lymien

Post on 14-Feb-2019

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A 1024-core 70GFLOP/W Floating Point Manycore Microprocessor · A 1024-core 70GFLOP/W Floating Point Manycore Microprocessor Andreas Olofsson, Roman Trogan, Oleg Raikhman Adapteva,

A 1024-core 70GFLOP/W Floating Point Manycore Microprocessor

Andreas Olofsson, Roman Trogan, Oleg Raikhman Adapteva, Lexington, MA

ANSI C Programmable IEEE Floating Point 70 GFLOPS/W

E1G3 E16G3 E1G4 E16G4 E64G4 E256G4 E1KG4 E4KG4 Cores 1 16 1 16 64 256 1024 4096

Process Geometry 65G 65G 28LP 28LP 28LP 28LP 28LP 28LP Max Frequency

(MHz) 1000 1000 700 700 700 700 700 700

Performance (GFLOPS/sec) 2 32 1.4 22 88 352 1408 5632

Performance (CoreMark) 1288 1288*16 900 900*16 900*64 900*256 900*1k 900*4k

Peak Energy Efficiency

(GFLOPS/W) 35 35 70 70 70 70 70 70

Total Area (mm2) 0.5 8.96 0.13 2.05 8.2 32.7 131.1 524.3

20-100W

0

10

20

30

40

50

60

70

80

0 5 10 15 20 25 30

GFLOPS/W

GFLOPS

Energy Efficiency

ENERGY EFFICIENCY

ENERGY EFFICIENCY (28nm)

Epiphany Introduction

Accelerator Model

A More Balanced Approach?

64 CORES @ 2 WATTS

28nm IP Offering

Programming Model Silicon Measurements

Built to Scale

1024 Cores 1 Core

Architecture designed from scratch to scale

LEGO approach to chip design

Array generator minimizes chip design

1024 cores easily manufactured in 28nm

100% C/C++ Programmable

Architecture is programming model agnostic

Could support SIMD, MIMD, Threading, Message Passing, Functional Programming.

Focusing on message passing model

Benchmarks

CoreMark C-Benchmark

Matrix Multiplication

(Assembly90%)

Software Development Kit

64 Cores

800 MHz

102 GFLOPS

70 GFLOP/W

3mm x 4mm

<2 Watt Chip Power

28nm Process

Available Q1,2012