event-driven computing · (elektor 26 may 2015) prime 18 january 2018 5. poets...has bought about...

34
PRiME 18 January 2018 1 POETS Event - driven computing Andrew Brown Southampton [email protected] David Thomas Imperial College [email protected] Simon Moore Cambridge [email protected] Andrey Mokhov Newcastle [email protected]

Upload: others

Post on 13-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 1

POETS

Event-driven computing

Andrew Brown Southampton

[email protected]

David Thomas Imperial College

[email protected]

Simon MooreCambridge

[email protected]

Andrey MokhovNewcastle

[email protected]

Page 2: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 2

POETSPartners

Page 3: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 3

POETSOutline

• POETS - the concept• The realisation• A panorama of applications• Down amongst the hard sums• A brief meander into reliability

Page 4: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 4

POETS

The rise of the many-core exponent is just beginning

Cores are the new transistor

–Silicon•10um .. 10 nm•210 reduction in

feature size•220 increase in

density•~20 process

generations

(Elektor 26 May 2015)

Page 5: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 5

POETS

...has bought about massive changes:

• Hardware increasingly statically and dynamically unreliable

• Cores are free• Communication costs ~ 1000 x compute costs

Fabrication technology...

Page 6: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 6

POETSSimulation: The old way• Build a model:

– Probably some sort of graph– Local interactions

• Process the model:– Linearise and shove it through some (small) set of

processors

• Big problems:– A lot of fetching– A lot of shoving– A lot of writing back

Page 7: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 7

POETSThe POETS way• Build a model:

– Probably some sort of graph– Local interactions

• Process the model:– Distribute processors over the model

• Big problems:– No fetching

• Wherever there's data, there's a processor– Little shoving

• Things interact with their neighbours in parallel– No writing back

• Data is stored locally to the generation site

Interaction via small, fastmessages

Page 8: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 8

POETS

...needs to change in response to this plethora of cores:

• Software must cope with– Unknown and changing physical core topologies– Non-deterministic message passing– Localised data

• Dramatic performance improvements– For certain classes of problems

• Underlying mathematics still valid– Algorithmic embodiment different

Software design...

Page 9: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 9

POETSPOETS

Partially Ordered Event Triggered Systems

We learnt so much

from Steve

Inspired by SpiNNaker

Page 10: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 10

POETSIt's all about speed

Wallclock time send-receive

Iridis 3 core-core inter node

1.8 Gbyte/s

Iridis 3 core-core intra node

4.0 Gbyte/s

SpiNNaker0.03 Gbyte/s

Event-based machines - lightly loaded networksMessage size

(bytes)

3 us 12 us600 ns276 ns100 ns44 ns

Zone of event-based

computingoperation

POETSinter node1.0 Gbyte/s

POETSintra node4.4 Gbyte/s

Conventional machines -massive payloads

Page 11: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 11

POETS

2.2GHz single core desktop

200MHz 48 node

SpiNNaker engine

The reward...• Dramatic performance improvements

– For certain classes of problemsO(n?) O(1)

Page 12: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 12

POETSImagine a system...

• Arbitrarily large number of cores– Cores are free

• Core-core communications cheap– We designed it that way

• What might we do with it?• How might we do anything with it?

Page 13: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 13

POETSOutline

• POETS - the concept• The realisation• A panorama of applications• Down amongst the hard sums• A brief meander into reliability

Page 14: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 14

POETSA POETS board

Inter-board router

Fast application port

4MB/s UART link to each core

1 core contains LogThreadsPerCore (24)

threads

1 mailbox attaches to LogCoresPerMailbox

(22) cores1 board contains

LogMailboxesPerBoard(24) mailboxes

Cached off-FPGA memory: 4 Gbyte I+D

1 core has LogInstrsPerCore (* 4 BYTESPERINSTR): 211*4 = 8k I-memory;

All threads in a core share I-space

All cores have access to the RAM

UART port

Bidirectional mailbox interconnect

Page 15: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 15

POETSA box of boards

1 box contains9 x DES-5 FPGA workers1 x DES-5 FPGA bridge

1 x PC mothership

Arbitrary topology 10Gb/s application network Exactly one 10

Gb/s link

Orchestrator plus peripheral process

hosts

1 worker contains~100 cores/FPGA ==>

~900 cores/box ==>14400 threads/box

Core clock = 275 MHz ==>thread clock = 17.2 MHz ==>

device clock == 17.2 kHz

Command and control network

Box

10 Gb/s links to other boxes 4 Mb/s UART links

Bridge board

Box

Mother ship(PC)

DES-5 FPGA board

(TByte SSD)

>1 GByte/s PCIe link MPI

backplane

Page 16: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 16

POETSThe penteract

Page 17: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 17

POETSThe reality

Page 18: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 18

POETSPrototype: Tinsel

• Tinsel = multithreaded POETS processor• 32-bit multithreaded RISC-V• 64 cores x 16 threads/Tinsel = 1024 P-cores• 16 caches, 16 mailboxes, 2 DDR3 controllers• 40% of DE5-NET FPGA board• Clocks at over ~300MHz (fast for a softcore)• Passes RISC-V regression test suite

Page 19: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 19

POETSHardware is nothing without software...

Page 20: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 20

POETSOutline

• POETS - the concept• The realisation• A panorama of applications• Down amongst the hard sums• A brief meander into reliability

Page 21: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 21

POETSWhat sort of problem?

• Anything that can be transformed into– Large number of simple processes– Asynchronous short-range communication

Page 22: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 22

POETSOutline

• POETS - the concept• The realisation• A panorama of applications• Down amongst the hard sums• A brief meander into reliability

Page 23: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 23

POETSA real problem....

( ))(1

)()(12

)()1( 2 ni

ni

ni

i

ni

ni uuu

xtDuu +−

+ +−∂∂

+=

• Heat equation

– becomes

2

2

xuD

tu

∂∂

=∂∂

nth timestep

dxi are different because the mesh is irregular

simulated time t = ndt

nth timestep

Page 24: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 24

POETSNumerical methods

Gauss-Seidl:x(m+1) = b - Lx(m+1) - Ux(m)

Convergence fast: newest values used as soon as they become available

Predicate tree thin - not parallelisable

Jacobi:x(m+1) = b + (I - A)x(m)

Convergence slow: entire old value set used to generate entire new value set

Predicate tree wide - easily parallelisable

New value being calculated

New values used as soon as they

are available

Previous value - no update yet available

New value being calculated

Only old values used

Page 25: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 25

POETSEvent-driven methodSolution trajectory (programmer-defined)

- sequential

New values used as soon as they are available

Previous value - no update yet available

New value being

calculated

Only old values used

(m)

x1

x2

x3

x4

x5

x6

x7

(m+1)

x1

x2

x3

x4

x5

x6

x7

Gauss-Seidl:(m)

x1

x2

x3

x4

x5

x6

x7

(m+1)

x1

x2

x3

x4

x5

x6

x7

Jacobi(m)

x1

x2

x3

x4

x5

x6

x7

(m+1)

x1

x2

x3

x4

x5

x6

x7

Event-drivenSolution trajectory (non-deterministic)

- massively parallel

Values updated randomly IN PARALLEL

Page 26: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 26

POETS

2.2GHz desktop

200MHz 48 node

SpiNNaker engine

Solution times

Page 27: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 27

POETSSynfire rings

Artificial “biological”

neural structures

Page 28: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 28

POETSOutline

• POETS - the concept• The realisation• A panorama of applications• Down amongst the hard sums• A brief meander into reliability

Page 29: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 29

POETSReliable computing on unreliable substrates

Finite difference heat diffusion canonical 2D square grid:

Diagonal temperature profile vs iteration

Page 30: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 30

POETSWhen things break:

– System with 1000000 cores

– Unrealistic to expect 100% uptime

– What happens when cores die?

– Algorithm 'self-heals' around unresponsive core

Page 31: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 31

POETS5% device death

Diagonal temperature profile vs iteration

Ping-pong numerical instability

appears to be real

Results degrade gracefully

Page 32: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 32

POETSOn the nature and consequence of failures

• Generic electronic system:– Stuck-at, bridging, crosstalk....– Arbitrary effects propagate - usually disabling

• Event-based architectures:– Interconnect complex– Core/node fails silently– Affected mesh sites disappear from model– Solution algorithm unaffected

Page 33: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 33

POETS

A desktop machine ...

- a personal computer -

... will have 25000 cores

By 2025...

Page 34: Event-driven computing · (Elektor 26 May 2015) PRiME 18 January 2018 5. POETS...has bought about massive changes: • Hardware increasingly statically and dynamically unreliable

PRiME18 January 2018 34

POETS